4. Content filtering with SquidGuard

SquidGuard is a combined filter, redirector and access controller plugin for Squid. We will use it to block access to specific categories of unwanted sites, based on IP addresses, URLs and regular expressions. SquidGuard comes with a very comprehensive list of commonly-banned web sites, divided into categories such as "porn", "drugs", "ads" and so on, making configuration rather simple and fast.

4.1 Installation

SquidGuard is available through OpenBSD's packages and ports system and requires the installation of the following packages:

The installation places a copy of the blacklists tarball (blacklists.tar.gz) in /usr/local/share/examples/squidguard/dest/. We will extract it into the /var/squidguard/db directory:

# cd /usr/local/share/examples/squidguard/dest/
# mkdir -p /var/squidguard/db
# tar -zxvC /var/squidguard/db -f blacklists.tar.gz
[...]
#

4.2 Configuration

SquidGuard's configuration file is /etc/squidguard/squidguard.conf; it is logically divided into six sections (please refer to the documentation for a more in-depth look at squidGuard's configuration options):

Path declarations
Specify the path to the logs and blacklists directories:
logdir  /var/squidguard/log
dbhome  /var/squidguard/db
Time space declarations
SquidGuard allows you to have different access rules based on time and/or date. A short example will probably best illustrate the flexibility of these rules.
time workhours {
    weekly  mtwhf  08:00-18:00
}

time night {
    weekly  * 18:00-24:00
    weekly  * 00:00-08:00
}

time holidays {
    date    *.01.01                 # New Year's Day
    date    *.05.01                 # Labour Day
    date    *.12.24 12:00-24:00     # Christmas Eve (short day)
    date    *.12.25                 # Christmas Day
    date    *.12.26                 # Boxing Day
}
Source group declarations
SquidGuard allows you to filter based on source IP address, domain and user (users credentials are passed by Squid along with the URL); e.g.:
src admin {
    ip      172.16.0.12                # The administrator's PC
    domain  lan.kernel-panic.it        # The LAN domain
    user    root administrator         # The administrator's login names
}

src lan {
    ip      172.16.0.0/24              # The internal network
    domain  lan.kernel-panic.it        # The LAN domain
}
Destination group declarations
One of the main features of SquidGuard is certainly its ability to filter based on destination address or domain. And this is where the pre-built databases we extracted before come in handy. The domainlist parameter specifies the path to a file containing a list of domain names (later on, we will see how to create the db files to speed up SquidGuard startup time): this must be a relative path rooted in the directory specified by the dbhome parameter. Similarly, the urllist and expressionlist parameters specify the (relative) path to files containing a list of URLs and regular expressions respectively. E.g.:
dest porn {
    domainlist	   blacklists/porn/domains
    urllist	   blacklists/porn/urls
    expressionlist blacklists/porn/expressions
    # Logged info is anonymized to protect users' privacy
    log anonymous  dest/porn.log
}
Access control rule declarations
Finally, we can combine all the previous rules to build Access Control Lists:
acl {
    admin within workhours {
        # The following rule allows everything except porn, drugs and
        # gambling sites during work hours. '!' is the NOT operator.
        pass !porn !drugs !gambling all
    } else {
        # Outside of work hours drugs and gambling sites are still blocked.
        pass !drugs !gambling all
    }
    lan {
        # The built-in 'in-addr' destination group matches any IP address.
        pass !in-addr !porn !drugs !gambling all
    }
    default {
        # Default deny to reject unknown clients
        pass none
        redirect  http://www.kernel-panic.it/error.html&ip=%a&url=%u
    }
}
The redirect rule declares the URL where to redirect users requesting blocked pages. SquidGuard can include some useful information in the URL by expanding the following macros:

Now that squidGuard is configured, we can build the Berkeley DB files for domains, URLs and regular expressions with the command:

# squidGuard -u -C all
# chown -R _squid /var/squidguard/

You can test that squidGuard configuration is working properly by simulating some Squid requests from the command line; squidGuard expects a single line on stdin with the following format (empty fields are replaced with "-"):

URL client_ip/fqdn user method urlgroup

and returns the configured redirect URL (if the site is blocked) or an empty line; for example:

# echo "http://www.blocked.site 1.2.3.4/- user GET -" | squidGuard -c /etc/squidguard/squidguard.conf -d
[ ... ]
2008-12-14 09:57:04 [27349] squidGuard ready for requests (1197622624.065)
http://www.kernel-panic.it/error.html&ip=1.2.3.4&url=http://www.blocked.site 1.2.3.4/- user GET
2008-12-14 09:57:04 [27349] squidGuard stopped (1197622624.067)
# echo "http://www.good.site 1.2.3.4/- user GET -" | squidGuard -c /etc/squidguard/squidguard.conf -d
[ ... ]
2008-12-14 10:30:24 [12046] squidGuard ready for requests (1197624624.421)

2008-12-14 10:30:24 [12046] squidGuard stopped (1197624624.423)

If everything is working as expected, we can configure Squid to use squidGuard as the redirector, by editing a few parameters in the /etc/squid/squid.conf file.

/etc/squid/squid.conf
# Path to the redirector program
url_rewrite_program   /usr/local/bin/squidGuard

# Number of redirector processes to spawn
url_rewrite_children  5

# To prevent loops, don't send requests from localhost to the redirector
url_rewrite_access    deny  localhost

and reload Squid configuration:

# squid -k reconfigure