Shortcut: direct links to the selection of Application benchmarks and Crafted benchmarks. You can also reproduce the selection yourself!.
The random benchmarks have been generated using a uniform random K-sat instance generator following the guidelines published by Oliver Kullmann for the SAT 2005 competition.
The instances are generated in both medium and large size. Medium benchmarks target complete solvers while large benchmarks target incomplete solvers. For each medium tuple of parameter (clause length, number of variables, ratio), 100 instances have been generated randomly. The 2007 winner of the random SAT category, gNovelty+, was used to descriminate between SAT and unsat benchmarks. The benchmarks found SAT be gNovelty+ (usually in a few seconds) were classified as SAT while the others were classified as unknown (probably UNSAT). Once classified that way, 10 SAT and 10 UNKNOWN benchmarks were selected per tuple of parameter. For large benchmarks, the instances have been generated slightly below the threshold, so they are probably SAT. We generated 100 of them, among which 10 where randomly selected per tuple of parameter.
Medium (SAT+UNKNOWN) | Large (probably SAT) | |||
---|---|---|---|---|
K | Ratio | Number of variables | Ratio | Number of variables |
3 | 4.26 | 360, 380, 400, 420, 440, 460, 480, 500, 520, 540, 560 | 4.2 | 2000, 4000, 6000, 8000, 10000, 12000, 14000, 16000, 18000, 20000*, 22000*, 24000*, 26000* |
5 | 21.3 | 90, 100, 110, 120 | 20 | 700, 800, 900, 1000, 1100 |
7 | 89 | 60, 65, 70, 75 | 81 | 140, 160, 180, 200, 220 |
Total SAT | 190 | 190+40* | ||
Total UNSAT | 190 | NA | Total | 380 | 230 |
610 |
The rationale was to have as many SAT than UNKNOWN benchmarks for complete solvers and as many UNKNOWN as large probably SAT benchmarks, i.e. specific benchmarks for complete and incomplete solvers.
After the first stage on random category, it appeared that the incomplete solvers were able to solve basically all the large instances for K=3. So, in agreement with the judges, it was decided to add larger benchmarks (followed by a star) to challenge the solvers during the second stage.
No more than 10% of the benchmarks should come from the same source.
It was decided to build a set of 300 instances. As a consequence, no more than 30 benchmarks could be accepted per source.
The final selection of benchmarks should contain 45% existing benchmarks and 55% submitted benchmarks.
As a consequence, the final set of 300 benchmarks should contain 165 submitted benchmarks and 135 existing benchmarks.
The final selection should contain 10% easy, 40% medium and 50% hard benchmarks.
As a consequence, among the 135 existing benchmarks, we should have exactly 13 easy, 54 medium and 68 hard benchmarks while among the 165 submitted benchmarks we should have exactly 17 easy, 66 medium and 82 benchmarks.
Duplicate benchmarks found after the selection was done will simply be removed from the selection. No other benchmarks will be added to the selection.
As a consequence, the final set of benchmarks will be less than 300 benchmarks.
In the case of the application category, the benchmarks are coming from several sources:
Note that the IBM benchmarks disappeared from the existing benchmarks because they are no longer available from IBM web site.
The benchmarks have been classified as EASY/MEDIUM/HARD benchmarks according to the results of the two winners of the SAT07 competition, industrial category (RSAT and PICOSAT), plus the widely used Minisat. Since those three solvers participated to the 3 competitive events, we simply used the publicly available results to classify the benchmarks.
Note that such definition is relative to the best solvers in 2007 for that category. As a consequence, some SAT solvers might find hard to solve some benchmarks classified easy while others may be able to solve some hard benchmarks in reasonable time.
EASY | MEDIUM | HARD | Total | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Origin | SAT | UNSAT | ALL | SAT | UNSAT | ALL | SAT | UNSAT | UNKNOWN | ALL | |
Existing | 11 | 29 | 45 | 66 | 87 | 147 | 8 | 29 | 45 | 82 | 295 |
Submitted | 60 | 38 | 98 | 38 | 60 | 98 | 8 | 12 | 82 | 102 | 298 |
Total | 71 | 67 | 143 | 104 | 147 | 245 | 16 | 41 | 127 | 184 | 593 |
Details | |||||||||||
Submitted | |||||||||||
aprove09 | 21 | 0 | 21 | 4 | 0 | 4 | - | - | - | - | 25 |
bio I | 3 | 0 | 3 | 6 | 11 | 17 | - | - | - | - | 20 |
bio II | 9 | 0 | 9 | 4 | 3 | 7 | - | - | 24 | 24 | 40 |
c32sat | 0 | 1 | 1 | 1 | 3 | 4 | - | 3 | 2 | 5 | 10 |
bitverif | - | 14 | 14 | - | 22 | 22 | - | 6 | 23 | 29 | 65 |
crypto | 5 | - | 5 | 7 | 6 | 13 | 4 | - | 20 | 24 | 42 |
diagnosis | 22 | 23 | 45 | 16 | 15 | 31 | 4 | 3 | 13 | 20 | 96 |
Existing | |||||||||||
grieu | - | - | - | 7 | - | 7 | 3 | - | - | 3 | 10 |
palacios | - | - | - | 7 | 12 | 19 | 1 | 2 | 5 | 8 | 27 |
jarvisalo | - | - | - | - | 4 | 4 | - | 1 | 2 | 3 | 7 |
aprove07 | - | - | - | 1 | 8 | 9 | - | 2 | 5 | 7 | 16 |
post | - | 1 | 1 | 2 | 3 | 5 | - | 4 | - | 4 | 10 |
narain | 1 | - | 1 | 2 | 1 | 3 | - | - | 1 | - | 5 |
mizh | - | - | - | 13 | - | 13 | - | - | - | - | 13 |
goldb | - | - | - | - | 5 | 5 | - | 5 | - | 5 | 10 |
manolios | - | 4 | 4 | - | 21 | 21 | - | 5 | - | 5 | 26 |
babic | 5 | 15 | 20 | - | 10 | 10 | - | - | - | - | 3 |
misc | 5 | 8 | 13 | 7 | 11 | 18 | 1 | 1 | - | 2 | 33 |
diagnosis07 | - | - | - | 10 | 6 | 16 | 3 | 1 | 27 | 30 | 46 |
velev | - | 1 | 1 | 17 | 6 | 23 | - | 8 | 5 | 13 | 37 |
The number of instances submitted by a given source ranged from 10 to 96. Thus it was not possible to accept as is all the submissions. Limiting the number of instance to 30 at most per submitter changed a bit the landscape:
EASY | MEDIUM | HARD | Total | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Origin | SAT | UNSAT | ALL | SAT | UNSAT | ALL | SAT | UNSAT | UNKNOWN | ALL | |
Existing | 11 | 29 | 40 | 52 | 84 | 136 | 6 | 29 | 34 | 69 | 245 |
Submitted | 26 | 3 | 29 | 25 | 40 | 65 | 8 | 10 | 63 | 81 | 175 |
Total | 33 | 32 | 69 | 77 | 124 | 201 | 14 | 39 | 97 | 150 | 420 |
Details | |||||||||||
Submitted | |||||||||||
aprove09 | 21 | 0 | 21 | 4 | 0 | 4 | - | - | - | - | 25 |
bio I | 3 | 0 | 3 | 6 | 11 | 17 | - | - | - | - | 20 |
bio II | 1 | 0 | 1 | 4 | 3 | 7 | - | - | 22 | 22 | 30 |
c32sat | 0 | 1 | 1 | 1 | 3 | 4 | - | 3 | 2 | 5 | 10 |
bitverif | - | 1 | 1 | - | 11 | 11 | - | 4 | 14 | 18 | 30 |
crypto | 1 | - | 1 | 6 | 5 | 11 | 4 | - | 14 | 18 | 30 |
diagnosis | - | 1 | 1 | 4 | 7 | 11 | 4 | 3 | 11 | 18 | 30 |
Existing | |||||||||||
grieu | - | - | - | 7 | - | 7 | 3 | - | - | 3 | 10 |
palacios | - | - | - | 7 | 12 | 19 | 1 | 2 | 5 | 8 | 27 |
jarvisalo | - | - | - | - | 4 | 4 | - | 1 | 2 | 3 | 7 |
aprove07 | - | - | - | 1 | 8 | 9 | - | 2 | 5 | 7 | 16 |
post | - | 1 | 1 | 2 | 3 | 5 | - | 4 | - | 4 | 10 |
narain | 1 | - | 1 | 2 | 1 | 3 | - | - | 1 | - | 5 |
mizh | - | - | - | 13 | - | 13 | - | - | - | - | 13 |
goldb | - | - | - | - | 5 | 5 | - | 5 | - | 5 | 10 |
manolios | - | 4 | 4 | - | 21 | 21 | - | 5 | - | 5 | 26 |
babic | 5 | 15 | 20 | - | 10 | 10 | - | - | - | - | 3 |
misc | 5 | 8 | 13 | 7 | 11 | 18 | 1 | 1 | - | 2 | 33 |
diagnosis07 | - | - | - | 6 | 6 | 12 | 1 | 1 | 16 | 18 | 30 |
velev | - | 1 | 1 | 7 | 4 | 11 | - | 8 | 5 | 13 | 25 |
Finally, once the pourcentage of existing/submitted and easy/hard/medium benchmarks are applied, we end up with the following selection of benchmarks for the competition:
(List) | EASY | MEDIUM | HARD | Total | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Origin | SAT | UNSAT | ALL | SAT | UNSAT | ALL | SAT | UNSAT | UNKNOWN | ALL | |
Existing | 1 | 10 | 11 | 22 | 33 | 55 | 6 | 29 | 34 | 69 | 135 |
Submitted | 18 | 1 | 19 | 25 | 40 | 65 | 8 | 10 | 63 | 81 | 165 |
Total | 19 | 11 | 30 | 47 | 73 | 120 | 14 | 39 | 97 | 150 | 300 |
During the selection of benchmarks, the initial set of benchmarks did contain several duplicated benchmarks. This was discovered once the benchmarks were added to the competition system because several statictics regarding the benchmarks are automatically computed. manol-pipe-f9b.cnf and manol-pipe-g10nid.cnf appeared both in HARD and MEDIUM category, post* benchmarks submitted to the SAT09 competition were exactly the ones used during the SAT RACE08. (we simply removed holes in variable numbering). It was decided to remove those duplicate benchmarks from the selection, without adding new ones.
(Corrected List) | EASY | MEDIUM | HARD | Total | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Origin | SAT | UNSAT | ALL | SAT | UNSAT | ALL | SAT | UNSAT | UNKNOWN | ALL | |
Existing | 1 | 9 | 10 | 21 | 33 | 54 | 6 | 23 | 34 | 63 | 127 |
Submitted | 18 | 1 | 19 | 25 | 40 | 65 | 8 | 10 | 63 | 81 | 165 |
Total | 19 | 10 | 29 | 46 | 73 | 119 | 14 | 33 | 97 | 144 | 292 |
Details | |||||||||||
Submitted | |||||||||||
aprove09 | 14 | - | 14 | 4 | - | 4 | - | - | - | - | 18 |
bio I | 3 | - | 3 | 6 | 11 | 17 | - | - | - | - | 20 |
bio II | 1 | - | 1 | 4 | 3 | 7 | - | - | 22 | 22 | 30 |
c32sat | - | - | - | 1 | 3 | 4 | - | 3 | 2 | 5 | 9 |
bitverif | - | 1 | 1 | - | 11 | 11 | - | 4 | 14 | 18 | 30 |
crypto | - | - | - | 6 | 5 | 11 | 4 | - | 14 | 18 | 29 |
diagnosis | - | - | - | 4 | 7 | 11 | 4 | 3 | 11 | 18 | 29 |
Existing | |||||||||||
grieu | - | - | - | 5 | - | 5 | 3 | - | - | 3 | 8 |
palacios | - | - | - | 1 | 6 | 7 | 1 | 2 | 5 | 8 | 15 |
jarvisalo | - | - | - | - | 3 | 3 | - | 1 | 2 | 3 | 6 |
aprove07 | - | - | - | - | - | - | - | 2 | 5 | 7 | 7 |
narain | - | - | - | - | - | - | - | - | 1 | 1 | 1 |
mizh | - | - | - | 7 | - | 7 | - | - | - | - | 7 |
goldb | - | - | - | - | 1 | 1 | - | 5 | - | 5 | 6 |
manolios | - | 2 | 2 | - | 11 | 11 | - | 3 | - | 3 | 15 |
babic | - | 3 | 3 | - | - | - | - | - | - | - | 3 |
misc | 1 | 4 | 5 | 2 | 6 | 8 | 1 | 1 | - | 2 | 15 |
diagnosis07 | - | - | - | 2 | 3 | 5 | 1 | 1 | 16 | 18 | 23 |
velev | - | - | - | 4 | 3 | 7 | - | 8 | 5 | 13 | 20 |
For the crafted category, the selection of benchmarks was make from the SAT07 benchmarks, crafted category (that contained a selection of benchmarks from all the previous competitions.
We followed the same principle as for the application category. The SAT solvers choosen to evaluate the difficulty of the benchmarks are the winners of the 2007 competition (March-KS and Satzilla-Crafted, plus Minisat 2007 that would have been the winner of that category using this year scoring scheme.
EASY | MEDIUM | HARD | Total | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Origin | SAT | UNSAT | ALL | SAT | UNSAT | ALL | SAT | UNSAT | UNKNOWN | ALL | |
Existing | - | 4 | 4 | 28 | 56 | 84 | 5 | 18 | 90 | 113 | 201 |
Submitted | 118 | 10 | 128 | 75 | 9 | 84 | 6 | - | 355 | 371 | 583 |
Total | 118 | 14 | 132 | 103 | 65 | 168 | 11 | 18 | 445 | 484 | 784 |
Details | |||||||||||
Submitted | |||||||||||
Existing |
Following the rule applied for the application category, no single source should have more than 30 benchmarks. The updated numbers follows.
EASY | MEDIUM | HARD | Total | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Origin | SAT | UNSAT | ALL | SAT | UNSAT | ALL | SAT | UNSAT | UNKNOWN | ALL | |
Existing | - | 4 | 4 | 24 | 51 | 75 | 6 | 18 | 85 | 109 | 188 |
Submitted | 34 | 10 | 44 | 50 | 9 | 59 | 6 | - | 70 | 76 | 179 |
Total | 34 | 14 | 48 | 74 | 60 | 134 | 12 | 18 | 155 | 185 | 367 |
Details | |||||||||||
Submitted | |||||||||||
parity games | 6 | 8 | 14 | 7 | 2 | 9 | - | - | 7 | 7 | 30 |
sgen09 | 5 | 1 | 6 | 4 | 2 | 6 | - | - | 18 | 18 | 30 |
modcircuits | - | 1 | 1 | 4 | 1 | 5 | - | - | 13 | 13 | 19 |
ramseycube | 1 | - | 1 | 5 | 3 | 8 | - | - | 1 | 1 | 10 |
rbsat | - | - | - | 11 | 1 | 12 | - | - | 18 | 18 | 30 |
edge matching | - | - | - | 18 | - | 18 | 6 | - | 6 | 12 | 30 |
sgi | 22 | - | 22 | 1 | - | 1 | - | - | 7 | 7 | 30 |
Existing | |||||||||||
contest 02 | - | - | - | - | 3 | 3 | 4 | - | 7 | 11 | 15 |
contest 03 | - | - | - | - | 1 | 1 | - | 6 | 19 | 25 | 26 |
contest 04 | - | - | - | 1 | 7 | 8 | 1 | 2 | 17 | 20 | 28 |
jarvisalo | - | - | - | 14 | 4 | 18 | - | 2 | - | 2 | 20 |
QG | - | - | - | 4 | 16 | 20 | - | 1 | 7 | 8 | 28 |
phnf | - | - | - | - | 1 | - | - | 1 | 8 | 9 | 10 |
misc | - | - | - | 3 | 3 | 6 | - | - | 10 | 10 | 16 |
sgen07 | - | 4 | 4 | - | 6 | 6 | - | 2 | 4 | 6 | 16 |
sabharwal | - | - | - | 2 | 10 | 12 | 1 | 4 | 13 | 18 | 30 |
Among those benchmarks, 300 were randomly selected to match the ratios of easy/medium and hard decided by the judges.
(List) | EASY | MEDIUM | HARD | Total | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Origin | SAT | UNSAT | ALL | SAT | UNSAT | ALL | SAT | UNSAT | UNKNOWN | ALL | |
Existing | - | 4 | 4 | 19 | 42 | 61 | 4 | 12 | 58 | 74 | 139 |
Submitted | 19 | 7 | 26 | 50 | 9 | 59 | 6 | - | 70 | 76 | 161 |
Total | 19 | 11 | 30 | 69 | 65 | 120 | 11 | 10 | 129 | 150 | 300 |
Details | |||||||||||
Submitted | |||||||||||
parity games | 5 | 5 | 10 | 7 | 2 | 9 | - | - | 7 | 7 | 26 |
sgen09 | 1 | 1 | 2 | 4 | 2 | 6 | - | - | 18 | 18 | 26 |
modcircuits | - | 1 | 1 | 4 | 1 | 5 | - | - | 13 | 13 | 19 |
ramseycube | 1 | - | 1 | 5 | 3 | 8 | - | - | 1 | 1 | 10 |
rbsat | - | - | - | 11 | 1 | 12 | - | - | 18 | 18 | 30 |
edge matching | - | - | - | 18 | - | 18 | 6 | - | 6 | 12 | 30 |
sgi | 12 | - | 12 | 1 | - | 1 | - | - | 7 | 7 | 20 |
Existing | |||||||||||
contest 02 | - | - | - | - | 2 | 2 | 2 | - | 5 | 7 | 9 |
contest 03 | - | - | - | - | 1 | 1 | - | 3 | 11 | 14 | 15 |
contest 04 | - | - | - | - | 6 | 6 | 1 | 1 | 14 | 16 | 22 |
jarvisalo | - | - | - | 11 | 3 | 14 | - | 1 | - | 1 | 15 |
QG | - | - | - | 3 | 14 | 17 | - | 1 | 4 | 5 | 22 |
phnf | - | - | - | - | 1 | - | - | 1 | 6 | 7 | 8 |
misc | - | - | - | 3 | 2 | 5 | - | - | 9 | 9 | 14 |
sgen07 | - | 4 | 4 | - | 4 | 4 | - | 2 | 4 | 6 | 14 |
sabharwal | - | - | - | 2 | 9 | 11 | 1 | 3 | 5 | 9 | 20 |
Errata: it happened that 3 benchmarks were wrongly added twice to the selection (because they matched two different regular expressions used to classify the benchmarks).
Those benchmarks are contest04-connm-ue-csp-sat-n800-d0.04-s362703357.sat05-548.reshuffled-07.cnf, contest04-connm-ue-csp-sat-n600-d0.02-s1676244754.sat05-535.reshuffled-07.cnf,contest03-SGI_30_50_30_20_3-dir.sat05-440.reshuffled-07.cnf.
Furthermore, the parity-games generator provides exactly the same benchmark for n=2,i=2 with or without the ce and ci parameters. As such, the instance_n2_i2_pp_ci_ce.cnf benchmark has been removed too from the selection.
The Q32inK08.cnf benchmark from the RamseyCube family was wrongly encoding the very same problem as Q3inK09.cnf so it as been removed as well.
Finally, the sgen generator submitted to the 2009 competition contains in fact two generators: one for satisfiable benchmarks, and one for unsatisfiable benchmarks. Only the satisfiable generator was new, the unsat generator is exactly the same as the SAT2007 competition. Therefore, the SAT07 and SAT09 unsat sgen benchmarks should be considered to come from the same source. It means that all (14) the SAT07 benchmarks from Ivor Spence should be removed to follow the rule that no more than 30 benchmarks should come from the same source.
The final selection without duplicates is thus a set of 281 benchmarks.