Next: Graphical analysis of SOTA
Up: Other views of the
Previous: Other views of the
Geoff Sutcliffe and Christian Suttner are running the CASC9
competition for many years now, and provide in [SS01]
some clues about what is a fair way to evaluate automated theorem
provers. One of the key ideas of their work is the notion of State Of The Art (SOTA) solver. It is based on a subsumption
relationship between the set of instances solved by the competing
solvers: ``a solver A is better than a solver B iff solver A solves a
strict subset of the instances solved by solver B''. The underlying
idea is that any solver being the only one to solve an instance is
meaningful. The subsumption relationship provides a partial order
between the competing solvers, and a virtual solver representing
advances of the whole community, the SOTA solver. This solver
can solve every problem solved by any of the competing solvers (so
would be the unique maximal element for the subsumption
relationship).
There is a little chance that the SOTA solver is a real solver,
and thus the notion of SOTA contributors is introduced by the
authors. They correspond to the maximal elements of the subsumption
relationship. The SOTA solver is equivalent to the set of SOTA
contributors running in parallel.
Tables 10, 11 and
12 show the problems solved by only one solver during
the first stage of the competition, for the three categories of
benchmarks (hors-concours solvers are discarded). We can now find
SOTA contributors from each table:
- Table 10 (industrial)
- SOTA contributors for
industrial instances are 2clseq, berkmin, limmat, modoc and zchaff.
Note that 2clseq worked very well on rand_net benchmarks, same thing
for zchaff and pipe instances.
- Table 11 (hand-made)
- SOTA contributors for
hand-made benchmarks are berkmin, limmat, rb2cl and zchaff.
- Table 12 (random)
- SOTA contributors for
randomly generated benchmarks are 2clseq, dlmsat1, dlmsat3, OKsolver,
unitwalk, usat05, usat10.
Table 10:
Uniquely solved industrial benchmarks during the first stage
Solver |
Bench (shortname) |
cpu (s) |
2clseq |
bmc2/cnt09.cnf |
198.00 |
2clseq |
rand_net/50-60-10 |
96.33 |
2clseq |
rand_net/60-40-10 |
16.03 |
2clseq |
rand_net/60-60-10 |
180.45 |
2clseq |
rand_net/60-60-5 |
172.22 |
2clseq |
rand_net/70-40-10 |
35.02 |
2clseq |
rand_net/70-40-5 |
52.15 |
2clseq |
rand_net/70-60-10 |
188.11 |
2clseq |
rand_net/70-60-5 |
611.71 |
2clseq |
satex-c/c6288-s |
0.73 |
2clseq |
satex-c/c6288 |
0.77 |
berkmin |
dinphil/dp10u09 |
321.67 |
berkmin |
fpga-r/rcs_w9 |
2 054.50 |
berkmin |
satex-c/cnf-r4-b2-k1.1 |
1 402.24 |
limmat |
Homer/homer16 |
2 315.05 |
limmat |
dinphil/dp12s12 |
5.21 |
modoc |
Homer/homer15 |
843.87 |
zchaff |
fifo8/fifo8_200 |
1 417.68 |
zchaff |
w10/w10_70.cnf |
661.74 |
zchaff |
satex-/9vliw_bp_mc |
1 274.94 |
zchaff |
fvp-u.2.0/5pipe |
186.58 |
zchaff |
fvp-u.2.0/5pipe_3_oo |
533.11 |
zchaff |
fvp-u.2.0/5pipe_4_ooo |
1 667.30 |
zchaff |
fvp-u.2.0/5pipe_5_ooo |
814.92 |
zchaff |
fvp-u.2.0/7pipe_bug |
452.14 |
|
Table 11:
Uniquely solved hand-made benchmarks during the first stage
Solver |
Benchmark |
CPU (s) |
berkmin |
hanoi6_on |
295.04 |
berkmin |
rope_1000 |
2 058.62 |
berkmin |
x1.1_32 |
57.13 |
berkmin |
x1_32 |
82.18 |
limmat |
pyhala-braun-sat-40-4-01 |
1 295.77 |
rb2cl |
lisa21_1_a |
1 955.62 |
zchaff |
lisa21_2_a |
287.74 |
zchaff |
pyhala-braun-sat-40-4-02 |
971.59 |
zchaff |
pyhala-braun-unsat-35-4-01 |
1 474.30 |
zchaff |
pyhala-braun-unsat-35-4-02 |
1 653.80 |
zchaff |
Urquhart-s3-b5 |
1 507.11 |
zchaff |
x1.1_36 |
786.60 |
zchaff |
x2_36 |
1 828.81 |
|
Table 12:
Uniquely solved randomly generated benchmarks during the first stage
Solver |
Benchmark |
CPU (s) |
2clseq |
5col100_15_1 |
1 148.08 |
2clseq |
5col100_15_4 |
1 097.78 |
2clseq |
5col100_15_5 |
1 018.13 |
2clseq |
5col100_15_6 |
1 353.14 |
2clseq |
5col100_15_7 |
748.21 |
2clseq |
5col100_15_8 |
1 160.66 |
2clseq |
5col100_15_9 |
821.42 |
dlmsat1 |
hgen5-v300-s1895562135 |
391.94 |
dlmsat1 |
hgen5-v300-s528975468 |
512.81 |
dlmsat1 |
hgen2-v300-s1807441418 |
651.35 |
dlmsat1 |
hgen3-v450-s646636548 |
141.16 |
dlmsat1 |
hgen4-v200-s2074278220 |
16.79 |
dlmsat1 |
hgen4-v200-s812807056 |
1 158.20 |
dlmsat3 |
hgen3-v450-s356974048 |
821.42 |
dlmsat3 |
4col280_9_2 |
1 766.60 |
dlmsat3 |
4col280_9_4 |
6.60 |
dlmsat3 |
4col280_9_6 |
1 792.59 |
dlmsat3 |
4col280_9_7 |
108.43 |
OKsolver |
glassy-v399-s1993893641 |
1 162.07 |
OKsolver |
glassy-v399-s524425695 |
814.44 |
OKsolver |
glassybp-v399-s1499943388 |
273.31 |
OKsolver |
glassybp-v399-s944837607 |
615.27 |
OKsolver |
glassyb-v399-s1267848873 |
1 001.71 |
OKsolver |
5cnf_3900_3900_160 |
243.24 |
OKsolver |
5cnf_3900_3900_170 |
911.32 |
OKsolver |
5cnf_3900_3900_180 |
1 790.46 |
OKsolver |
5cnf_3900_3900_190 |
1 778.96 |
OKsolver |
5cnf_4300_4300_090 |
1 390.55 |
OKsolver |
5cnf_4300_4300_100 |
1 311.92 |
unitwalk |
hgen2-v650-s2139597266 |
536.09 |
unitwalk |
hgen2-v700-s543738649 |
32.38 |
unitwalk |
hgen2-v700-s548148704 |
0.62 |
unitwalk |
hgen3-v500-s1754870155 |
157.03 |
usat05 |
okgen-c2550-v600-s552691850 |
5.57 |
usat10 |
hgen3-v450-s1400022686 |
0.39 |
|
In order to obtain a total order among the solvers, one must first
classify benchmarks themselves. For this, the notion of SOTA
contributors can be used [SS01]: benchmarks solved by
all SOTA contributors are said easy, those solved by at
least one SOTA contributor (but not all) are called difficult10. Note that the benchmarks not solved by any
solver are not considered here.
Now, it is easy to rank the solvers accordingly to the number of
difficult instances they can solve. Tables
13 and 14 provide a
summary for SAT2002 competition, for respectively complete solvers
and satisfiable benchmarks.
Table 13:
Number of difficult SAT+UNSAT benchmarks solved by each solver during the first stage
Industrial |
Handmade |
Randomly generated |
solver |
#solved |
solver |
#solved |
solver |
#solved |
berkmin |
121 |
berkmin |
68 |
OKsolver |
548 |
zchaff |
104 |
zchaff |
68 |
marchII |
543 |
2clseq |
93 |
limmat |
47 |
marchIIse |
543 |
limmat |
87 |
simo |
43 |
2clseq |
369 |
simo |
57 |
2clseq |
34 |
rb2cl |
338 |
rb2cl |
36 |
OKsolver |
33 |
zchaff |
305 |
OKsolver |
35 |
jquest |
16 |
simo |
298 |
jquest |
31 |
marchIIse |
13 |
berkmin |
263 |
marchIIse |
29 |
rb2cl |
10 |
limmat |
221 |
modoc |
26 |
marchII |
9 |
modoc |
219 |
blindsat |
20 |
modoc |
7 |
jquest |
122 |
marchII |
16 |
blindsat |
0 |
blindsat |
3 |
|
Table 14:
Number of difficult satisfiable benchmarks solved by each solver during the first stage
Industrial |
Handmade |
Randomly generated |
solver |
#solved |
solver |
#solved |
solver |
#solved |
berkmin |
68 |
zchaff |
35 |
OKsolver |
548 |
zchaff |
64 |
berkmin |
33 |
marchII |
543 |
limmat |
54 |
limmat |
29 |
marchIIse |
543 |
2clseq |
46 |
simo |
27 |
2clseq |
369 |
unitwalk |
41 |
OKsolver |
23 |
rb2cl |
338 |
simo |
39 |
2clseq |
21 |
zchaff |
305 |
dlmsat2 |
37 |
unitwalk |
11 |
simo |
298 |
dlmsat1 |
36 |
dlmsat1 |
11 |
berkmin |
263 |
rb2cl |
35 |
marchIIse |
10 |
dlmsat1 |
255 |
dlmsat3 |
35 |
dlmsat3 |
10 |
dlmsat2 |
234 |
saturn |
34 |
saturn |
9 |
dlmsat3 |
233 |
usat10 |
33 |
usat05 |
9 |
unitwalk |
227 |
usat05 |
31 |
usat10 |
9 |
limmat |
221 |
OKsolver |
30 |
dlmsat2 |
9 |
modoc |
219 |
marchIIse |
26 |
marchII |
7 |
saturn |
208 |
jquest |
24 |
rb2cl |
6 |
usat10 |
206 |
blindsat |
20 |
modoc |
5 |
usat05 |
205 |
modoc |
20 |
jquest |
4 |
jquest |
122 |
ga |
18 |
blindsat |
0 |
blindsat |
3 |
marchII |
13 |
ga |
0 |
ga |
3 |
|
What can we conclude? First of all, we awarded SOTA
contributors. Looking at the SOTA ranking per category, berkmin,
zchaff and OKsolver would be awarded. Limmat, our fourth awarded, is
most of the time third after zchaff and berkmin in industrial and
hand-made categories. So the result of the SAT2002 competition looks
reasonable. Berkmin certainly deserves a special attention from the
community, since despite the bug that made it crashed on more than 100
benchmarks, it is ranked first in the industrial category using the
SOTA system. This little impact of crashes can certainly be due to
the fact that only the second SAT-engine of Berkmin crashed, which
means that most instances on which Berkmin crashed are hard for all
solvers anyway (this engine is called for hardest instances only).
Furthermore, we now have difficult and unsolved instances for the
next competition. The degree of difficulty provided by the SOTA
system can be used to tune solvers for the next competition: first
try to solve instances of medium difficulty, then try the really hard
ones. All that information will be included in the instances archive.
Next: Graphical analysis of SOTA
Up: Other views of the
Previous: Other views of the
LE BERRE Daniel
2002-09-16