Spot The Difference - Documentation

Home

Installation Notes

Documentation

Download

Contacts

User Guide - Table of Contents

1. What is Spot The Difference
2. Installation

2.1 Unix/Linux
2.2 Windows

3. Databases

3.1 dbm File
3.2 SQLite File
3.3 MySQL Database
3.4 PostgreSQL Database

4. Configuration File

4.1 Variables

4.1.1 dbm Variables
4.1.2 SQLite Variables
4.1.3 MySQL Database Variables
4.1.4 PostgreSQL Variables

4.2 Rules

4.2.1 Checks

5. Usage
6. Final Report
7. Bugs
8. Author and Copyright

1. What is Spot The Difference

Spot The Difference is a file integrity checker. Its goal is to detect signs of intrusion by looking for suspicious changes in system files. Crackers, in fact, to do their evil or just to make sure they can work their way back into the system, often change some configuration files, executables and/or log files (usually with rootkits); thus leaving signs of the break-in.

An integrity checker works in two phases:

the update phase, in which it creates a database reflecting the state of the filesystem in a moment the system administrator is sure about its integrity;
the check phase, in which the current state of the filesystem is compared to the database records. That's where it comes to spot the difference between two almost identical filesystems: the 'real' filesystem and the 'virtual' filesytem stored in the database.

Information about what files to check, which checks to perform and the connection to the database is set out in the configuration file.

The database created during the update phase can't be modified (you can't add, remove or change records to it). You can, of course, create a new database that reflects the new state of the filesystem.

2. Installation

Spot The Difference is fully developed in python; so you need to have python (at least 2.3.x). If you don't already have it, you can download it here. Using dbm database files to store files information doesn't require the installation of any additional python module. If you wish to use another database (MySQL, PostgreSQL and SQLite are supported), you might need to install database-specific modules.

Python database modules used by Spot The Difference are:

mysql-python - MySQL python module
psycopg - PostgreSQL python module
sqlite-python - SQLite python module

2.1 Unix/Linux

To install Spot The Difference on a Unix/Linux system follow these few steps:

unpack the tarball:
# tar -zxvf stdiff-0.2.1.tar.gz
move to the newly created directory:
# cd ./stdiff-0.2.1
run the install script by typing:
# python setup.py install

This will copy all modules in the third-party modules directory and the scripts in the local executables directory (usually /usr/local/bin on UN*X systems). A sample configuration (stdiff.conf.sample) file will be copied to /etc.

2.2 Windows

To install Spot The Difference on a Windows system, just run the graphical installer (stdiff-0.2.1.win32.exe); you will be asked a couple of questions:

Python Directory (mine is C:\Python23)
Installation Directory (default is <python_directory>\Lib\site-packages\)

After a couple of 'next', the installer will copy the scripts in the python scripts directory (<python_dir>\Scripts\) and the modules in the third-party modules directory (<python_dir>\Lib\site-packages\). A sample configuration file (stdiff.conf.sample) will be copied to <python_dir>\etc\.

3. Databases

The next step after installation is to create the database that will hold files information. Spot The Difference supports most of the open source databases (MySQL, PostgreSQL, SQLite and dbm files). If you want to use dbm or SQLite files, you don't need to create the database now: it will be automatically created at runtime.

The advantage of dbm files is their simplicity and portability. You can find a lot of software on the internet for viewing and managing their content. and you don't need to install any additional software or python module.

Also SQLite databases are stored in files and thus don't require setting up a database server. They are much faster than dbm files, but require the installation of an additional python module.

If you wish to use Spot The Difference with MySQL or PostgreSQL databases, you will need to create the database and the table that will hold files information. To do this, simply run the script

stdiff_install_db db_type

(where db_type can be either mysql or pgsql). It will guide you through the creation of the database and/or tables. You will be prompted to answer a few questions and eventually the database will magically appear.

Using a database server, like MySQL or PostgreSQL, allows you to hold data from multiple monitored machines in a single repository . All machines can query/update a single, centralized, database. The security of the database server machine becomes, of course, fundamental. To view the content of the database you can use the database server tools.

Since the configuration file must contain the password to access to the database, it is recommended to create/update the database with a privileged user and then do the later checks with an unprivileged user, with only SELECT granted.

3.1 dbm File

dbm (Data Base Management) files are binary databases of key-value pairs. They are local files and their integrity must be preserved setting them as read-only (read-only NFS, read-only medium, chflags) after their creation.

3.2 SQLite File

"SQLite is a small C library that implements a self-contained, embeddable, zero-configuration SQL database engine". SQLite databases are local files and, like dbm files, their integrity must be preserved setting them as read-only (read-only NFS, read-only medium, chflags) after their creation.

3.3 MySQL Database

MySQL is "the world's most popular open source database". After the installation, the command:

stdiff_install_db mysql

will start an interactive script that will drive you through the creation of the database.

You can also create the database and the table yourself. Though you can't change fields names, data types are customizable. These are the default values:

Field Type Description

path VARCHAR(255) BINARY PRIMARY KEY Full path of the file or directory (255 characters max)

md5 CHAR(32) md5 file checksum (16 bytes)

sha CHAR(40) sha1 file checksum (20 bytes)

st_mode SMALLINT UNSIGNED File permissions in decimal format

st_ino MEDIUMINT UNSIGNED File inode number (3 bytes: 16777215 max)

st_dev SMALLINT UNSIGNED File device (2 bytes: 65535 max)

st_nlink SMALLINT UNSIGNED Number of links (2 bytes: 65535 max)

st_uid INT UNSIGNED User ID (2 bytes: 4294967295 max)

st_gid INT UNSIGNED Group ID (2 bytes: 4294967295 max)

st_size BIGINT UNSIGNED File size (8 bytes: 18446744073 GBytes max)

st_atime INT UNSIGNED Access time (timestamp: 4 bytes)

st_mtime INT UNSIGNED Modification time (timestamp: 4 bytes)

st_ctime INT UNSIGNED Change time (timestamp: 4 bytes)

Field	Type	Description
path	VARCHAR(255) BINARY PRIMARY KEY	Full path of the file or directory (255 characters max)
md5	CHAR(32)	md5 file checksum (16 bytes)
sha	CHAR(40)	sha1 file checksum (20 bytes)
st_mode	SMALLINT UNSIGNED	File permissions in decimal format
st_ino	MEDIUMINT UNSIGNED	File inode number (3 bytes: 16777215 max)
st_dev	SMALLINT UNSIGNED	File device (2 bytes: 65535 max)
st_nlink	SMALLINT UNSIGNED	Number of links (2 bytes: 65535 max)
st_uid	INT UNSIGNED	User ID (2 bytes: 4294967295 max)
st_gid	INT UNSIGNED	Group ID (2 bytes: 4294967295 max)
st_size	BIGINT UNSIGNED	File size (8 bytes: 18446744073 GBytes max)
st_atime	INT UNSIGNED	Access time (timestamp: 4 bytes)
st_mtime	INT UNSIGNED	Modification time (timestamp: 4 bytes)
st_ctime	INT UNSIGNED	Change time (timestamp: 4 bytes)

3.3 PostgreSQL Database

"PostgreSQL is an object-relational database management system (ORDBMS) based on POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer Science Department". After the installation, the command

stdiff_install_db pgsql

will start an interactive script that will drive you through the creation of the database.

You can also create the database and the table yourself. Though you can't change fields names, data types are customizable. These are the default values:

Field Type Description

path VARCHAR(255) PRIMARY KEY Full path of the file or directory (255 characters max)

md5 CHAR(32) md5 file checksum (16 bytes)

sha CHAR(40) sha1 file checksum (20 bytes)

st_mode INT File permissions in decimal format

st_ino INT File inode number (4 bytes)

st_dev INT File device (4 bytes)

st_nlink INT Number of links (4 bytes)

st_uid INT User ID (4 bytes)

st_gid INT Group ID (4 bytes)

st_size BIGINT File size (8 bytes)

st_atime INT Access time (timestamp: 4 bytes)

st_mtime INT Modification time (timestamp: 4 bytes)

st_ctime INT Change time (timestamp: 4 bytes)

Field	Type	Description
path	VARCHAR(255) PRIMARY KEY	Full path of the file or directory (255 characters max)
md5	CHAR(32)	md5 file checksum (16 bytes)
sha	CHAR(40)	sha1 file checksum (20 bytes)
st_mode	INT	File permissions in decimal format
st_ino	INT	File inode number (4 bytes)
st_dev	INT	File device (4 bytes)
st_nlink	INT	Number of links (4 bytes)
st_uid	INT	User ID (4 bytes)
st_gid	INT	Group ID (4 bytes)
st_size	BIGINT	File size (8 bytes)
st_atime	INT	Access time (timestamp: 4 bytes)
st_mtime	INT	Modification time (timestamp: 4 bytes)
st_ctime	INT	Change time (timestamp: 4 bytes)

4. Configuration File

The next step after the creation of the database, is to edit the configuration file, which defines the run-time behaviour of Spot The Difference. It includes information about connecting to the database, files to check and which checks to perform on those files.

It is made up of:

variables: variables provide all the information needed for Spot The Difference to connect to the database. Required information varies from one database to another (see below). Also e-mail notification parameters (server and recipients) are set through variables;
rules: rules contain strings which represent the paths (files and directories) to check and the checks to perform on those paths. Pay close attention when writing the rules: poorly written rules may generate false positives and/or not detect actual intrusions;
comments: comments start with a hash sign (#) and may be inline or take up a whole line.

A sample configuration file (stdiff.conf.sample) is provided with the software and placed in /etc (<python_dir>\etc on Windows systems).

4.1 Variables

Variables provide all the information needed for Spot The Difference to connect to the database. Firstly, you have to set the value of the db_type variable to the database type to use (legal values are: dbm, sqlite, mysql and pgsql). For example:

db_type = mysql

The other variables that can be set are login variables (user and passwd), server variables (host, port or unix_socket) and database variables (db and table). Not all database types require the setting of all these variables (e.g. dbm and SQLite database files don't require login or host and port specification). See below for database-specific variables.

If you wish to receive the final report by e-mail (-e option), you have to set a couple of additional variables:

mail_server

containing the SMTP server name or address. If it doesn't use the default port (25), you can specify the port number with the usual syntax server:port. For example:

  mail_server = mailserver.my.domain:2500

mail_recipients

containing a list of whitespace separated e-mail addresses. For example:

  mail_recipients = foo@my.domain bar@my.domain

4.1.1 'dbm' Variables

For a dbm file, you only need to specify its absolute path; it must be assigned to the db variable. E.g.:

db = /root/stdiff/stdiff.dbm

4.1.2 SQLite Variables

If you use a SQLite database file, you need to specify its absolute path (in the db variable) and the name of the table (in the table variable) in which to insert files information. E.g.:

db = /root/stdiff/stdiff.sql table = my_hostname

4.1.3 MySQL Variables

To connect to a MySQL server, you have to set:

authentication variables:

user
database user to connect with. It is recommended, once the database has been created and populated with a privileged user, to use an unprivileged user (with only SELECT granted) for the later filesystem checks. This avoids leaving information on the filesystem that could help an attacker to modify the database;
passwd
MySQL password;
server connection variables:

host
name or address of the database server
port
TCP port number the database server is listening on.
unix_socket
UNIX socket to connect to the database through.

NOTE: you can't set both port and unix_socket variables. Spot The difference wouldn't know how to connect.
database variables:

db
database name;
table
name of the table to insert data into or to retrieve data from.

Configuration file entries for a MySQL server connection would look like this:

user = my_user passwd = my_password host = localhost unix_socket = /var/run/mysql/mysql.sock db = Spot table = my_hostname

To connect to the database through a TCP port instead of a socket, the fourth entry would have been:

port = 3306

4.1.3 PostgreSQL Variables

To connect to a PostgreSQL server, you have to set:

authentication variables:

user
database user to connect with. It is recommended, once the database has been created and populated with a privileged user, to use an unprivileged user (with only SELECT granted) for the later filesystem checks. This avoids leaving information on the filesystem that could help an attacker to modify the database;
passwd
PostgreSQL password;
server connection variables:

host
name or address of the database server or, if you connect through a UNIX socket, the directory containing the socket;
port
TCP port number the database server is listening on.

NOTE: if you connect through a UNIX socket, you don't need to set the port variable.
database variables:

db
database name;
table
name of the table to insert data into or to retrieve data from.

Configuration file entries for a PostgreSQL server connection (through a UNIX socket) would look like this:

user = my_user passwd = my_password host = /tmp db = Spot table = my_hostname

To connect to a remote database, you should assign its name or address to the host variable and set the port variable:

host = 1.2.3.4 port = 3306

4.2 Rules

Rules specify the paths (files and directories) to check and the checks to perform. Each rule takes one line and consists of one or two whitespace separated fields:

the first field is always a pathname. If it is a directory, the rule extends to all files and directories below it, recursively. The pathname may contain all the usual UNIX file globbing characters (*, ?, []) to include all pathnames matching that pattern. No whitespace is allowed inside a path (? can be used instead);
the second field is required only by some rules (see below) and it is a string specifying which checks must be performed on the path in the first field;

There are four types of rules, identified by their prefix:

No prefix

Rules with no prefix are 'root rules'. They specify which files and directories must be checked. They are made up of a pathname and a checks string. If the pathname is a directory, checks extend to all files and directories below it, recursively. There can be any number of root rules. The following rule:

   /etc      5iplzc

means that all files and directories in /etc must be checked, recursively. For the meaning of the checks string, see below.

!

Pathnames preceded by an exclamation mark are ignored. These rules are only made up of a path. If it is a directory, everything below it is ignored. The following rules:

  /etc         51plzc
  !/etc/motd
  !/etc/X11

check all the files and directories in /etc except the file /etc/motd and the whole directory tree below /etc/X11.

$

A directory or a file inside the directory tree of a root rule may need special checks. Simply write the pathname of that file or directory, preceded by a dollar sign , and the checks to perform on that path. The following rules:

  /etc             5iplzc
  $/etc/inetd.conf 5siplzc
  $/etc/ssh        5siplzc

perform extra checks on files inside the directory /etc/ssh and on the file /etc/inetd.conf. Such rules are non-cascading, i.e. they don't get inherited by subdirectories. In the previous example, directories in /etc/ssh (if any) wouldn't inherit the checks from the /etc/ssh rule, but from the /etc rule.

=

Rules made up of an equal sign followed by a directory pathname mean that all directories below that pathname must be ignored. The following rules:

  /etc       5iplxc
  =/etc/X11

check all the /etc directory tree except all directories below /etc/X11. The files in /etc/X11, instead, are checked.

4.2.1 Checks

As stated previously, some rules must contain the list of checks to perform on a specific pathname. Below is a list of available file checks; each one is identified by a single character:

  5  md5 checksum
  s  sha1 checksum
  p  permessions
  i  inode number
  d  device
  l  number of links
  u  user ID
  g  group ID
  z  size
  a  Most recent access time
  m  Time of the most recent modification of the content of the file
  c  Time of the last modification of inode 'metadata' (on UNIX) or creation
     date (on Windows)

You must specify all the checks you want to be performed (there is no special 'all' string) with no whitespace between. The following rule:

/etc 5sugmc

will:

calculate md5 and sha1 checksums of all the files inside the /etc directory tree and
record the user ID, group ID, modification time and change time of all the directories and files inside the /etc directory tree.

Checksums are calculated only on files, not on directories. Checksum calculation needs to open the file for reading, thus modifying its access time. Setting both 5 or s and a checks in the same rule will lead to a number of false positives.

For critical files, it is recommended to calculate both md5 and sha1 checksums, since it's theoretically possible to modify a file and pad it to leave its checksum unchanged. Don't forget, however, that some rootkits serve up the original file (hidden somewhere) when you open it for reading and the compromised file when you execute it. So pay close attention to new, unexpected files.

5. Usage

Well, so far we have created the database and edited the configuration file. What we need to do now is to update the database and then schedule a periodic check of the filesystem. The syntax of Spot The Difference is:

    stdiff.py [-h] [-v|-q] [-C config_file] [-c|-u] [-o output_file] [-e]

Almost all parameters are optional. It is necessary, however, to specify wether a filesystem check (-c) or a database update (-u) is required . The options are as follows:

-C, --configfile
Specify the configuration file path. Default is /etc/stdiff.conf

-u, --update
Create a new 'known-state' database or overwrite an existing database

-c, --check
Check filesystem integrity

-o, --outfile
Specify the pathname of the final report. Default is stdiff.out in the current directory

-e, --email
Turn on email notification

-v, --verbose
Verbose mode

-q, --quiet
Quiet (almost dumb) mode

-h, --help
Print a short help message and exit

--version
Print the version number and exit

Below are some examples. To update the database, preserving all the default settings, simply run:

# stdiff.py -u

This will parse the default configuration file (/etc/stdiff.conf) and create a new 'known-state' database (or drop and recreate a pre-existing one). The final report will be saved to stdiff.out in the current directory.

If you want to override the default settings, the command:

# stdiff.py -u -o /root/stdiff/stdiff.out -C /root/stdiff/stdiff.conf -v

will update the database taking database parameters and rules from the configuration file /root/stdiff/stdiff.conf (-C option). The name of all the files inserted in the database will be displayed (verbose mode, -v) and the final report will be saved to /root/stdiff/stdiff.out, as specified by the -o option.

Once you have populated the database, you should schedule periodic checks of the filesystem. The command:

# stdiff.py -C /root/stdiff/stdiff.conf -c -o /root/stdiff/stdiff.out -e

will compare the current filesystem to the one recorded in the database. It will save the final report to /root/stdiff/stdiff.out and e-mail it (-e option) to the addresses specified in the configuration file /root/stdiff/stdiff.conf.

6. Final Report

After the creation/update of the database, a detailed report is generated. It contains statistics on the update process:

launch paramters (database, configuration file, etc.);
number of recorded files;
elapsed time;
errors (if any) in calculating checksums or retrieving file data

After a filesystem check, the generated report provides all the above data plus a detailed list of:

missing files;
new files found on the system;
changes to filesystem.

This is a sample report generated after a database update and this one is generated after a filesystem check.

7. Bugs

Thanks to Jens Engel for pointing out an unhandled exception when a broken symlink was found. Release 0.2.1 has fixed this issue and now, when updating the database, stdiff will report broken links in the final report:

Could not open these files: [...] /usr/bin/brokenlink No such file or directory: '/usr/bin/brokenlink'

Of course this prevents the broken symlink from being inserted into the database. Then, on the next filesystem check, the broken symlink will be considered a new file, unless you delete it or you tell stdiff to ignore it, adding a rule to the configuration file:

!/usr/bin/brokenlink

Spot The Difference has been tested on *BSD, Linux and Windows sytems. Please send bug reports and comments by email.

8. Author and Copyright

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the developer nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

User Guide - Table of Contents