How the SAT Competition 2003 will proceed?
SAT Competition 2003
February - May, 2003
This is subject to change. Last modification: December 4, 2002.
A new SAT competition is going to start on February 2003 and continue
until May 2003. The results of the competition will be given at the SAT 2003 conference
(Italy). That competition follows the SAT Competition 2002 started on March
10, 2002 and continued until (and during) SAT 2002 Symposium
(May 6-9, 2002, Cincinatti, Ohio, USA). |
Because a competition needs strict winners, according to the competition
rules, this page describe how we will find them. Of course, this is only
related to the rules we have defined. This page describes this rules and
how we'll choose the "best" SAT solvers (3 awards will be given) for this
competition, and what benchmarks will be the hardest (according to our rules).
For further details, please take a look at the technical report
concerning the SAT 2002 competition.
Month |
Week |
N° |
Competition phases |
February |
3 to 9 |
0 |
|
February |
10 to 16 |
1 |
Submission of Benchmarks/Solvers
(Feb,14th) |
February |
17 to 23 |
2 |
Compliance testing |
February/March |
24 to 2 |
3 |
Buggy solvers returned (Feb, 28th) |
March |
3 to 9 |
4 |
Buggy solvers resubmitted (March,
7th) |
March |
10 to 16 |
5 |
First stage starts (March, 14th) |
March |
17 to 23 |
6 |
first stage |
March |
24 to 30 |
7 |
first stage |
March/April |
31 to 6 |
8 |
first stage |
April |
7 to 13 |
9 |
first stage |
April |
14 to 20 |
10 |
Second stage starts (April, 14th) |
April |
21 to 27 |
11 |
second stage |
April/May |
28 to 4 |
12 |
second stage |
May |
5 to 11 |
13 |
Awards revealed (during conference) |
As shown on the above calendar, the competition will be 13 weeks long,
and separated in 3 rounds:
- The first one (in red/white, weeks 1-4), is devoted to compliance
testing of submissions.
- The second one (green, weeks 6-9) is the first stage of the
competition, were each solver will be tried on the same set (or a subset)
of submitted benchmarks.
- The third one (blue, weeks 10-12) is the second stage of the
competition. The TOP-N solvers in each category of benchmarks will be tried
on the hardest benchmarks in each category (each category beeing partitioned
into SAT/UNSAT instances).
First round (red/white, weeks 1-4): before the actual
competition starts, solvers and benchmarks will be checked for compliance
(essentially input/output). For each submitter, an account will be opened
on a Linux machine over the web, where compilation may be checked. Please
do notice that before week 5, no submitted solvers will be tested on any
submitted (new) benchmark.
- Solvers: after the first deadline (week 1), solvers will be
tested for bugs (please read carefully what we call a buggy solver)
on previous (not submitted) benchmarks. Buggy solvers will be then returned
to their authors (week 3), who will take their chances to fix the bugs.
After possible re-submission (week 4), the solvers are checked again, the
still-buggy solvers are excluded, and the actual competition starts (while
if a solver is found buggy even at a later stage, it is excluded anyway).
Please read carefully the call for solvers for more
details.
- Benchmarks: as described in the call for
benchmarks, there will be three categories of benchmarks (Industrial,
HandMade and Random). Because of intrinsic limits of incomplete solvers,
the distinction will be made between SAT and UNSAT instances (submitters
will have to precise, for each benchmark or generator, if it is SAT, UNSAT
or UNKNOWN). During weeks 1-4, Industrial (pre-generated) benchmarks will
be checked for compliance (please note that our CNF format is very
restrictive). For handmade (crafted) and random ones, on which generators
are required, weeks 1-4 will also be used to generate benchmarks, according
to their hardness with previous solvers.
Second round, first stage (green, weeks 5-9): this
is the end of the interactive part of the competition. We will run each
solver on each applicable benchmark (i.e., we will not run incomplete solvers
on the benchmarks known to be unsatisfiable). If a solver does not answer
within time limit (to be determined when we know the number of submissions),
it is interrupted. Randomized solvers will be started several times (at
least 3 times) and the median time will be taken into account ("time limit"
is charged for the unsuccessful tries). Time limit will be fixed after
week 1, when the number of submission will be known (this number will be
made public).
Third round, second stage (green, weeks 5-9): for this
last stage of the competition we select top N solvers (determined
from the first stage by the same method as will be used for determining
the winners after the second stage, considering only
the best variant of a given solver) in each of the 6 categories and run
them on the M smallest unsolved benchmarks (again, separately in
each category), smaller benchmarks come earlier. We suppose N=
M =4 now, but Laurent Simon may change it right after the deadline if
the number of benchmarks and/or solvers and/or our computers will be far
from what we predict. Clearly, at this stage the time limit will be substantially
larger.
Results (yellow, week 13): awards will be given during
the SAT2003 conference. We will also try to give a first analysis of results (see
below for awards).
There will be 8 prizes. We will grant 6 winners for the solvers and
2 for the benchmarks:
- Solvers (6 awards): there will be two categories of solvers
(complete ones and incomplete ones). Clearly, a complete solver can take
part in the competition in both categories. Because there is three categories
of benchmarks, there will be a total of 2*3=6 winners,
the same amount of second place winners, etc. The rules we'll use for ranking
solvers are:
- For a given category, a winning solver is determined by the total
number of "hard" (second stage) benchmarks it was able to solve. If this
figure is the same for several solvers, then we look into the first stage
results.
- First, the total number of solved series of benchmarks is counted
(we count here series as "solved" if at least one benchmark is solved).
- If this figure is the same again, then the total number of solved
benchmarks is counted.
- If even this figure is the same, the total elapsed CPU time is counted.
All solvers that fall within 1 sec. * number of benchmarks of the
winner are also declared winners.
However, if a solver solves no (zero) benchmarks for given category,
it cannot be declared a winner under any circumstances. Furthermore, if the
winner in a category was already awarded for that category in the previous
competition, then there will be no award given for that category.
- Benchmarks (2 awards): we will award the authors of the
smallest benchmarks (in terms of the total number of literal occurrences)
that were not solved by any solver will be declared winners. There will
be two winners: one for a satisfiable benchmark, and one for an unsatisfiable
one. We do not award benchmarks submitted as "unknown" ones. The following
benchmarks will be excluded from the consideration here:
- Benchmarks based on formulas described in scientific papers
(or other publicly available sources) not authored by the person submitting
these benchmarks, if the description was aimed at presenting hard formulas
for algorithms or proof systems for (UN)SAT or another (co-)NP-hard problem.
- Benchmarks used in the previous SAT competitions (DIMACS, Bejing, Paderborn,
SAT2002) or benchmarks too similar to these.
- Benchmarks that are publicly available and known to be solved by at
least one solver (even not participating in the competition).
A beautiful certificate, unless some people are willing to provide prizes
to motivate the competition (then please contact the organizers).
What are the differences with the previous (2002) competition?
|
According to some important bias in the SAT 2002 competition (we identifed
them in our SAT 2002 competition report), we made the following decisions:
- No heuristics will be used to prevent useless launch (instead of preventing
Uncomplete solvers to be launch on Known-UNSAT instances). Especially, no
heuristic will be based on the hypothetical relative hardness of benchmarks
of a given family.
- Only one variant of a solver (the best one during the first stage)
will be allowed in the second stage (round 3) of the contest.
- A special session with a time limit given for solving a sequence
of benchmarks may be organized, depending on the number of people willing
to participate to that session, and provided that we get a satisfying automated
framework to do it.
- A board of 3 judges will be appointed to check that the competition rules
are respected and take decisions if a problem occurs.
- February, 14: deadline for submitting solvers.
- February, 21 (UPDATED): deadline for submitting benchmarks.
- February, 28: buggy solvers returned to the authors.
- March, 7: buggy solvers resubmitted.
- March, 14: 1st stage of the competition starts.
- April, 14 (or earlier): 2nd stage of the competition starts.
- May: results during SAT2003 conference, awards.
Last modification: 3 December 2002.