Updated on March 19 with poll results. The login of the submitters that voted for a given scoring scheme is displayed to allow the submitters to check that their vote has been properly taken into account.

Lexicographical NBSOLVED, sum ti (9 votes)
sat09-18, sat09-25, sat09-22, sat09-33, sat09-24, sat09-43, sat09-21, judge1, sat09-46
Cumulative time based, with timeout penalty, PENALTY = 10 (3 votes)
sat09-30, sat09-13, sat09-28
Cumulative time based, with timeout and robustness penalties (4 votes)
sat09-19, sat09-17, sat09-45, sat09-37

The competitors are expected to vote for the ranking scheme that will be used during the competition. One vote per submitter (i.e. login). A submitter can propose alternative ranking schemes based on the data available in the competition framework. The submitter vote (preferred scoring scheme) is to send by email to organizers at satcompetition dot org.

Poll Important deadlines

March 11, 2009
Proposal of extra schemes.
March 18, 2009 (UPDATED, NOT ENOUGH VOTES ON MARCH 13)
Poll for the preferred scheme.
March 18, 2009
Poll for the constant

Rationale

We have heard at least two criticisms against the scoring scheme used in the previous competitions. First, it aggregates two criteria which are orthogonal, and second, the score of a solver depends on other solvers.

From a practical point of view, a standard user of SAT technology wants to identify the solvers which are the most likely to solve its instances. This corresponds to a notion of robustness. "Best" solvers are the ones which solve the greatest number of instances, preferably in the smallest time.

From a "scientific" point of view, the community would like to award an innovative solver. A solver which solves instances that other solvers don't (SOTAC) is an example of such kind of solvers.

The purse system mixes these two points of view in a single ranking, but not everybody agrees with that.

Especially, the purse system has a the side effect to make the scores (thus the ranking) of two solvers depend on the results of a third solver.

So the idea this year is that the score of a solver should not depend on other solvers. This score should only depend on the chosen benchmarks set, the chosen time and memory limits and the chosen hardware. Hence the name SATSPEC09 to indicate that anybody would be potentially able to compare his solver with the solver submitted to the competition using that score. It would also allow to score additional solvers such as all the winners of the previous competitions without changing the scores of the solvers of this year competition.

There are several criteria which can be used to compare solvers:

Clearly, ranking solvers is a multi-criteria decison and some criteria are quite orthogonal. Depending on the point of view, it may or may not make sense to agregate some of this criteria.

One interesting question is if the speed of a faster solver can compensate for its failure to solve an instance. For example, assume solver A can solver 100 instances in 1000s (cumulated time) and solver B can solve the same 100 instances in only 100s. At this point, solver B is clearly better than solver A. Now assume solver A can solve one more instance than B. Which solver is the best? The answer is probably not unique and certainly depends on the user's applications and expectations.

Possible scoring

The competition framework generates the following data:

NBTOTAL
Total number of benchmarks to solve
NBSOLVED
Total number of benchmarks solved within a given timeout
NBUNSOLVEDSERIES
Total number of set of benchmarks for which the solver was unable to solve any element.
TIMEOUT
Time allowed to solve a given benchmark
ti
Time needed to solve a given benchmark, within the time limit
PENALTY
Constant to use as a penalty for benchmarks not solved within the timeout
SERIESPENALTY
Constant to use as a penalty for a set of benchmarks in which all members cannot be solved by the solver.
We can use those numbers in several way to rank the solvers.

Lexicographical NBSOLVED, sum ti

Since the first edition of the SAT Race, we have seen in many papers and presentations experiments made on the SAT Race benchmarks sets (100 benchmarks each). The tables are usually sorted lexicographically, first on the number of benchmarks solved, then on the cumulative CPU time needed to solve those benchmarks.

It is a simple ranking scheme, easy to reproduce, and that can be easily used by SAT solver developers to see where their solver stands among state-of-the-art SAT solvers by running their own solver of the set of benchmarks on computers similar to the ones used during the Races.

	(NBSOLVED, sum ti)

Note: in this scheme, a fast solver cannot compensate for its inability to solve an instance.

Cumulative time based, with timeout penalty

Another simple way to rank the solvers is simply to sum up the CPU times of the solver on the set of benchmarks, and to give a penalty when the solver times out or abort early.

	sum ti + (NBTOTAL-NBSOLVED)*TIMEOUT*PENALTY

Note: in this scheme, a fast solver can compensate for its inability to solve an instance. The value of PENALTY determines how faster a solver must be to compensate for the loss of an instance.

Cumulative time based, with timeout penalty, log based

The idea of that scoring is the same as the preceding one, but based on a log of each CPU time to score closely solvers running in the same order of magnitude.

	sum log10(1+ti) + (NBTOTAL-NBSOLVED)*log10((1 + TIMEOUT)*PENALTY)

The motivation for using a log is that human beings are probably more receptive to the log of a value rather than the value itsef (100s and 1000s makes a significant difference, 100s and 200s is not so different). The 1 is introduced to avoid negative score for timings below 1s. Of course, log10((1 + TIMEOUT)*PENALTY) can be rewritten as log10(1 + TIMEOUT)+log10(PENALTY).

Note: in this scheme, a fast solver can compensate for its inability to solve an instance. The value of PENALTY determines how faster a solver must be to compensate for the loss of an instance. For example, if PENALTY equals 100, failing on an instance can be compensated if the solver is 100 times faster on another instance, or if it 10 times faster on 2 instances, and so on. Possible values for PENALTY are 10, 100, 1000, 10000.

Cumulative time based, with timeout and robustness penalties

Proposed by Marijn Heule

The rationale of that scoring scheme is that a robust solver, i.e. a solver able to solve a wide range of benchmarks, (it could also be called multi-purpose), should be better ranked than a solver very good on very specific benchmarks. To do so, we add a new penalty for each series (kind of benchmark) for which the solver has not been able to solve any member.

	sum ti + (NBTOTAL-NBSOLVED)*TIMEOUT*PENALTY + NBUNSOLVEDSERIES*SERIESPENALTY

Note: we could also use a log scale on those numbers.

SAT 2005 and 2007 purse based scoring, aka Allen's scoring

See Allen's scoring poster or the SAT competition 2007 rules for details

Note that the purse based system does not obey the SATSPEC09 spirit, since it is dependent of the set of solvers to be evaluated.

Note: the advantage (and disadvantage) of this scoring is that it agregates several criteria.


Back to the competition home page

Example: value of the different scorings for the first phase of SAT 2007

In these tables, solvers are ranked according to the first scheme (Lexicographical NBSOLVED, sum ti). It can be easily seen that the other schemes introduce some slight variations.

Category "crafted instances" (CRAFTED)

Rank SolverVersionNumber of
solved instances
Cumulated CPU time
on solved instances
sum(time),
penalty=10
sum(time),
penalty=100
sum(time),
penalty=1000
sum(time),
penalty=10000
sum(log(time)),
penalty=10
sum(log(time)),
penalty=100
sum(log(time)),
penalty=1000
sum(log(time)),
penalty=10000
Purse score
1minisatSAT 20077113105.001573105.0015613105.00156013105.001560013105.001172.961302.961432.961562.9624601
2minisatSAT 2007 (with assertions)7113567.601573567.6015613567.60156013567.601560013567.601174.431304.431434.431564.43
3SATzillaFULL6911673.721595673.7215851673.72158411673.721584011673.721168.111300.111432.111564.11
4SATzillaCRAFTED6912281.441596281.4415852281.44158412281.441584012281.441172.401304.401436.401568.4030336
5minimarch2007-04-26 (fixed)6716433.821624433.8216096433.82160816433.821608016433.821205.411339.411473.411607.41
6MiraXTv35713929.061741929.0617293929.06172813929.061728013929.061267.471411.471555.471699.4714737
7MiraXTv25513445.051765445.0517533445.05175213445.051752013445.051281.041427.041573.041719.0410923
8MiraXTv15514146.511766146.5117534146.51175214146.511752014146.511281.901427.901573.901719.9013792
9MXC2007-02-085316084.611792084.6117776084.61177616084.611776016084.611311.031459.031607.031755.0315503
10CMUSAT2007-02-08529361.421797361.4217889361.42178809361.421788009361.421293.171442.171591.171740.1713726
11picosat5354612383.001872383.0018612383.00186012383.001860012383.001337.231492.231647.231802.2310512
12SAT72007-02-084614162.301874162.3018614162.30186014162.301860014162.301353.441508.441663.441818.447352
13SATzillaRANDOM448865.791892865.7918848865.79188408865.791884008865.791330.241487.241644.241801.2413112
14Barcelogic Fixed2007-04-13408704.211940704.2119328704.21193208704.211932008704.211375.831536.831697.831858.83
15TTS4.0373347.891971347.8919683347.89196803347.891968003347.891376.511540.511704.511868.5136457
16CMUSAT BASE2007-02-08379957.741977957.7419689957.74196809957.741968009957.741401.591565.591729.591893.595275
17Rsat2007-02-083712315.781980315.7819692315.78196812315.781968012315.781400.371564.371728.371892.379164
18March KS2007-02-08336519.652022519.6520166519.65201606519.652016006519.651397.161565.161733.161901.1629133
19Spear FH1.03312790.812028790.8120172790.81201612790.812016012790.811430.791598.791766.791934.79
20Spear FHS1.03211814.402039814.4020291814.40202811814.402028011814.401435.911604.911773.911942.91
21SAT4JSAT 20073212631.542040631.5420292631.54202812631.542028012631.541439.331608.331777.331946.33
22Spear2007-02-123111312.682051312.6820411312.68204011312.682040011312.681442.541612.541782.541952.54
23TiniSatELite2007-02-08278026.562096026.5620888026.56208808026.562088008026.561457.051631.051805.051979.053574
24tinisat2007-02-08255322.472117322.4721125322.47211205322.472112005322.471472.021648.021824.022000.023133
25DEWSATZ 1A2007-02-08205100.842177100.8421725100.84217205100.842172005100.841502.681683.681864.682045.684609
26DEWSATZ2007-04-26 (fixed)153961.612235961.6122323961.61223203961.612232003961.611532.851718.851904.852090.85
27KCNFSSMP155153.272237153.2722325153.27223205153.272232005153.271534.461720.461906.462092.462221
28KCNFS2006143157.382247157.3822443157.38224403157.382244003157.381538.521725.521912.522099.522142
29KCNFS2004143274.502247274.5022443274.50224403274.502244003274.501538.701725.701912.702099.702148
30ornithorynque0.1 alpha93296.362307296.3623043296.36230403296.362304003296.361574.551766.551958.552150.55
31Hybrid12007-02-088229.512316229.5123160229.51231600229.512316000229.511570.821763.821956.822149.821274
32FH2007-02-087261.542328261.5423280261.54232800261.542328000261.541577.751771.751965.752159.751183
33adaptg2wsat 2007-02-086199.812340199.8123400199.81234000199.812340000199.811585.821780.821975.822170.82
34adaptg2wsat+2007-02-086478.892340478.8923400478.89234000478.892340000478.891586.201781.201976.202171.20761
35ranov2007-02-0862773.862342773.8623402773.86234002773.862340002773.861592.701787.701982.702177.70578
36adaptg2wsat02007-02-084259.562364259.5623640259.56236400259.562364000259.561599.771796.771993.772190.77568
37adaptg2wsatp2007-02-084323.172364323.1723640323.17236400323.172364000323.171601.021798.021995.022192.02357
38UnitMarch2007-02-083421.622376421.6223760421.62237600421.622376000421.621607.221805.222003.222201.22581
39adaptnovelty2007-02-0831768.332377768.3323761768.33237601768.332376001768.331609.701807.702005.702203.70406
40Mmisat2007-02-082757.882388757.8823880757.88238800757.882388000757.881615.011814.012013.012212.01239
41saps2007-02-0821590.702389590.7023881590.70238801590.702388001590.701615.731814.732013.732212.73198
42gnovelty+2007-02-08161.942400061.9424000061.94240000061.942400000061.941619.811819.812019.812219.81214
43sapsrt2007-02-081147.422400147.4224000147.42240000147.422400000147.421620.191820.192020.192220.19158

Category "industrial instances" (INDUST)

Rank SolverVersionNumber of
solved instances
Cumulated CPU time
on solved instances
sum(time),
penalty=10
sum(time),
penalty=100
sum(time),
penalty=1000
sum(time),
penalty=10000
sum(log(time)),
penalty=10
sum(log(time)),
penalty=100
sum(log(time)),
penalty=1000
sum(log(time)),
penalty=10000
Purse score
1Rsat2007-02-0810618043.451554043.4515378043.45153618043.451536018043.451216.441344.441472.441600.4444623
2TiniSatELite2007-02-0810322193.271594193.2715742193.27157222193.271572022193.271243.191374.191505.191636.1930301
3picosat53510330610.291602610.2915750610.29157230610.291572030610.291273.051404.051535.051666.0525509
4Barcelogic Fixed2007-04-139921811.011641811.0116221811.01162021811.011620021811.011269.811404.811539.811674.81
5Spear FHS1.09922610.381642610.3816222610.38162022610.381620022610.381278.381413.381548.381683.38
6minisatSAT 20079720931.031664931.0316460931.03164420931.031644020931.031270.501407.501544.501681.5029714
7minisatSAT 2007 (with assertions)9620540.631676540.6316580540.63165620540.631656020540.631281.221419.221557.221695.22
8Spear FH1.09320041.311712041.3116940041.31169220041.311692020041.311317.361458.361599.361740.36
9MiraXTv19118849.091734849.0917178849.09171618849.091716018849.091309.361452.361595.361738.3623844
10tinisat2007-02-089021826.711749826.7117301826.71172821826.711728021826.711334.131478.131622.131766.1320237
11CMUSAT2007-02-088917871.181757871.1817417871.18174017871.181740017871.181320.431465.431610.431755.4325334
12MiraXTv38920671.201760671.2017420671.20174020671.201740020671.201324.881469.881614.881759.8826590
13minimarch2007-04-26 (fixed)8818993.441770993.4417538993.44175218993.441752018993.441328.831474.831620.831766.83
14SATzillaFULL8829939.411781939.4117549939.41175229939.411752029939.411382.151528.151674.151820.15
15MXC2007-02-088418692.391818692.3918018692.39180018692.391800018692.391372.341522.341672.341822.3419170
16MiraXTv28317916.251829916.2518137916.25181217916.251812017916.251358.721509.721660.721811.7218819
17Spear2007-02-128215355.451839355.4518255355.45182415355.451824015355.451386.281538.281690.281842.28
18SATzillaCRAFTED8126772.701862772.7018386772.70183626772.701836026772.701420.131573.131726.131879.1316722
19CMUSAT BASE2007-02-088022286.331870286.3318502286.33184822286.331848022286.331407.011561.011715.011869.0118726
20SAT72007-02-086921188.662001188.6619821188.66198021188.661980021188.661475.851640.851805.851970.8511449
21SATzillaRANDOM6623303.592039303.5920183303.59201623303.592016023303.591508.921676.921844.922012.9214932
22SAT4J JVM PARAM CHANGEDSAT 20076623870.232039870.2320183870.23201623870.232016023870.231499.781667.781835.782003.78
23SAT4JSAT 20075719670.662143670.6621259670.66212419670.662124019670.661549.291726.291903.292080.29
24ornithorynque0.1 alpha335553.222417553.2224125553.22241205553.222412005553.221682.431883.432084.432285.43
25DEWSATZ 1A2007-02-08226010.532550010.5325446010.53254406010.532544006010.531761.921973.922185.922397.922590
26DEWSATZ2007-04-26 (fixed)195044.842585044.8425805044.84258005044.842580005044.841779.581994.582209.582424.58
27KCNFS2006162701.242618701.2426162701.24261602701.242616002701.241790.602008.602226.602444.603029
28March KS2007-02-08122977.722666977.7226642977.72266402977.722664002977.721815.362037.362259.362481.361847
29KCNFSSMP101798.242689798.2426881798.24268801798.242688001798.241823.522047.522271.522495.522876
30adaptg2wsat+2007-02-0891458.572701458.5727001458.57270001458.572700001458.571834.812059.812284.812509.811638
31adaptg2wsat 2007-02-0881288.022713288.0227121288.02271201288.022712001288.021840.702066.702292.702518.70
32FH2007-02-0881810.912713810.9127121810.91271201810.912712001810.911842.032068.032294.032520.031871
33adaptg2wsat02007-02-087681.152724681.1527240681.15272400681.152724000681.151846.372073.372300.372527.37706
34adaptg2wsatp2007-02-0871145.092725145.0927241145.09272401145.092724001145.091845.922072.922299.922526.92794
35Hybrid12007-02-0871380.602725380.6027241380.60272401380.602724001380.601847.352074.352301.352528.35669
36ranov2007-02-0862025.902738025.9027362025.90273602025.902736002025.901856.462084.462312.462540.46801
37UnitMarch2007-02-0841734.502761734.5027601734.50276001734.502760001734.501870.262100.262330.262560.26356
38TTS4.0313.402772013.4027720013.40277200013.402772000013.401871.022102.022333.022564.02285
39adaptnovelty2007-02-0831143.102773143.1027721143.10277201143.102772001143.101876.272107.272338.272569.27261
40KCNFS200426.572784006.5727840006.57278400006.572784000006.571878.012110.012342.012574.01561
41gnovelty+2007-02-082575.182784575.1827840575.18278400575.182784000575.181880.972112.972344.972576.97240
42saps2007-02-082728.402784728.4027840728.40278400728.402784000728.401881.872113.872345.872577.87215
43sapsrt2007-02-0821291.782785291.7827841291.78278401291.782784001291.781882.122114.122346.122578.12202
44Mmisat2007-02-0800.002808000.0028080000.00280800000.002808000000.001893.082127.082361.082595.080

Category "random instances" (RANDOM)

Rank SolverVersionNumber of
solved instances
Cumulated CPU time
on solved instances
sum(time),
penalty=10
sum(time),
penalty=100
sum(time),
penalty=1000
sum(time),
penalty=10000
sum(log(time)),
penalty=10
sum(log(time)),
penalty=100
sum(log(time)),
penalty=1000
sum(log(time)),
penalty=10000
Purse score
1adaptg2wsat02007-02-0823125875.213385875.2133625875.21336025875.213360025875.212478.402758.403038.403318.4068240
2adaptg2wsat+2007-02-0822722724.013430724.0134102724.01340822724.013408022724.012505.872789.873073.873357.8763729
3Hybrid12007-02-0822624345.153444345.1534224345.15342024345.153420024345.152511.292796.293081.293366.2961922
4adaptg2wsat 2007-02-0822220570.713488570.7134700570.71346820570.713468020570.712538.352827.353116.353405.35
5adaptg2wsatp2007-02-0822226237.693494237.6934706237.69346826237.693468026237.692543.442832.443121.443410.4457502
6FH2007-02-0822022428.663514428.6634942428.66349222428.663492022428.662553.082844.083135.083426.0857098
7adaptnovelty2007-02-0821827987.633543987.6335187987.63351627987.633516027987.632577.052870.053163.053456.0559472
8ranov2007-02-0821627044.143567044.1435427044.14354027044.143540027044.142611.272906.273201.273496.2745425
9SATzillaRANDOM21531979.773583979.7735551979.77355231979.773552031979.772682.942978.943274.943570.9437613
10March KS2007-02-0820836762.873672762.8736396762.87363636762.873636036762.872793.813096.813399.813702.8148842
11SATzillaFULL20523696.123695696.1236743696.12367223696.123672023696.122731.683037.683343.683649.68
12gnovelty+2007-02-0820321409.293717409.2936981409.29369621409.293696021409.292656.192964.193272.193580.1969635
13KCNFS200419140265.133880265.1338440265.13384040265.133840040265.132915.303235.303555.303875.3040312
14SATzillaCRAFTED18924417.953888417.9538664417.95386424417.953864024417.952835.543157.543479.543801.5423667
15KCNFS200618938423.923902423.9238678423.92386438423.923864038423.922929.893251.893573.893895.8938670
16saps2007-02-081696197.604110197.6041046197.60410406197.604104006197.602855.583197.583539.583881.5832928
17sapsrt2007-02-081677135.464135135.4641287135.46412807135.464128007135.462865.693209.693553.693897.6937639
18KCNFSSMP16535051.844187051.8441555051.84415235051.844152035051.843089.133435.133781.134127.1337029
19DEWSATZ2007-04-26 (fixed)14238365.154466365.1544318365.15442838365.154428038365.153263.903632.904001.904370.90
20minimarch2007-04-26 (fixed)12536271.094668271.0946356271.09463236271.094632036271.093367.633753.634139.634525.63
21MXC2007-02-0810528842.424900842.4248748842.42487228842.424872028842.423492.693898.694304.694710.6910478
22minisatSAT 200710127872.794947872.7949227872.79492027872.794920027872.793517.533927.534337.534747.539724
23minisatSAT 2007 (with assertions)10027622.244959622.2449347622.24493227622.244932027622.243523.803934.804345.804756.80
24SAT72007-02-089528423.195020423.1949948423.19499228423.194992028423.193562.233978.234394.234810.238384
25DEWSATZ 1A2007-02-088424117.975148117.9751264117.97512424117.975124024117.973629.864056.864483.864910.868202
26SAT4JSAT 20077819580.725215580.7251979580.72519619580.725196019580.723660.944093.944526.944959.94
27MiraXTv37825731.145221731.1451985731.14519625731.145196025731.143667.174100.174533.174966.176659
28MiraXTv17418904.065262904.0652458904.06524418904.065244018904.063680.934117.934554.934991.936587
29MiraXTv26824496.655340496.6553184496.65531624496.655316024496.653734.364177.364620.365063.365536
30picosat5356719041.695347041.6953299041.69532819041.695328019041.693723.074167.074611.075055.075555
31UnitMarch2007-02-086217324.635405324.6353897324.63538817324.635388017324.633756.274205.274654.275103.275811
32CMUSAT BASE2007-02-08489710.385565710.3855569710.38555609710.385556009710.383831.784294.784757.785220.783922
33CMUSAT2007-02-08469128.255589128.2555809128.25558009128.255580009128.253844.264309.264774.265239.263766
34Rsat2007-02-08437117.745623117.7456167117.74561607117.745616007117.743862.564330.564798.565266.563603
35Spear FHS1.04012725.505664725.5056532725.50565212725.505652012725.503892.384363.384834.385305.38
36Spear FH1.03912585.655676585.6556652585.65566412585.655664012585.653897.164369.164841.165313.16
37Spear2007-02-12387746.955683746.9556767746.95567607746.955676007746.953897.864370.864843.865316.86
38TiniSatELite2007-02-08329104.945757104.9457489104.94574809104.945748009104.943933.554412.554891.555370.552939
39tinisat2007-02-08305302.665777302.6657725302.66577205302.665772005302.663937.764418.764899.765380.762884
40Barcelogic Fixed2007-04-13296332.625790332.6257846332.62578406332.625784006332.623953.364435.364917.365399.36
41ornithorynque0.1 alpha111459.236001459.2360001459.23600001459.236000001459.234064.364564.365064.365564.36
42Mmisat2007-02-081426.376120426.3761200426.37612000426.376120000426.374128.574638.575148.575658.57130
43TTS4.000.006132000.0061320000.00613200000.006132000000.004134.034645.035156.035667.030

Back to the competition home page