A Unified Approach for Solving Sequential Selection Problems

In this paper we develop a unified approach for solving a wide class of sequential selection problems. This class includes, but is not limited to, selection problems with no-information, rank-dependent rewards, and considers both fixed as well as random problem horizons. The proposed framework is based on a reduction of the original selection problem to one of optimal stopping for a sequence of judiciously constructed independent random variables. We demonstrate that our approach allows exact and efficient computation of optimal policies and various performance metrics thereof for a variety of sequential selection problems, several of which have not been solved to date.


Introduction
In sequential selection problems a decision maker examines a sequence of observations which appear in random order over some horizon. Each observation can be either accepted or rejected, and these decisions are irrevocable. The objective is to select an element in this sequence to optimize a given criterion. A classical example is the so-called secretary problem in which the objective is to maximize the probability of selecting the element of the sequence that ranks highest. The existing literature contains numerous settings and formulations of such problems, see, e.g., Gilbert and Mosteller (1966), Freeman (1983), Berezovsky & Gnedin (1984), Ferguson (1989), Samuels (1991) and Ferguson (2008); to make more concrete connections we defer further references to the subsequent section where we formulate the class of problems more precisely.
Sequential selection problems are typically solved using the principles of dynamic programming, relying heavily on structure that is problem-specific, and focusing on theoretical properties of the optimal solution; cf. Gilbert and Mosteller (1966), Berezovsky & Gnedin (1984) and Ferguson (2008). Consequently, it has become increasingly difficult to discern 2 Sequential selection problems Let us introduce some notation and terminology. Let X 1 , X 2 , . . . be an infinite sequence of independent identically distributed continuous random variables defined on a probability space (Ω, F , P). Let R t be the relative rank of X t and A t,n be the absolute rank of X t among the first n observations (which we also refer to as the problem horizon): 1(X t ≤ X j ), A t,n := n j=1 1(X t ≤ X j ), t = 1, . . . , n.
(1) Note that with this notation the largest observation has the absolute rank one, and R t = A t,t for any t. Let R t := σ(R 1 , . . . , R t ) and X t := σ(X 1 , . . . , X t ) denote the σ-fields generated by R 1 , . . . , R t and X 1 , . . . , X t , respectively; R = (R t , 1 ≤ t ≤ n) and X = (X t , 1 ≤ t ≤ n) are the corresponding filtrations. In general, the class of all stopping times of a filtration Y = (Y t , 1 ≤ t ≤ n) will be denoted T (Y ); i.e., τ ∈ T (Y ) if {τ = t} ∈ Y t for all 1 ≤ t ≤ n. Sequential selection problems are classified according to the information available to the decision maker and the structure of the reward function. The settings in which only relative ranks {R t } are observed are usually referred to as no-information problems, whereas full information refers to the case when random variables {X t } can be observed, and their distribution is known. In addition, the total number of available observations n can be either fixed or random with given distribution. These cases are referred to as problems with fixed and random horizon, respectively.

Problems with fixed horizon
In this paper we mainly consider selection problems with no-information and rank-dependent reward. The prototypical sequential selection problem with fixed horizon, no-information and rank-dependent reward is formulated as follows; see, e.g., Gnedin & Krengel (1996).
The objective is to find the rule τ * ∈ T (R) satisfying V * n (q) := max τ ∈T (R) V n (q; τ ) = Eq A τ * ,n and to compute the optimal value V * n (q).
Depending on the reward function q we distinguish among the following types of sequential selection problems with fixed horizon.
Best-choice problems. The settings in which the reward function is an indicator are usually referred to as best-choice stopping problems. Of special note are the following.
(P1). Classical secretary problem. This problem setting corresponds to the case q(a) = q csp (a) := 1{a = 1}. Here we want to maximize the probability P{A τ,n = 1} of selecting the best alternative over all stopping times τ from T (R). It is well known that the optimal policy will pass on approximately the first n/e observations and select the first subsequent to that which is superior than all previous ones, if such an observation exists; otherwise the last element in the sequence is selected. The limiting optimal value is lim n→∞ V * n (q csp ) = 1/e (Lindley 1961, Dynkin 1963, Gilbert and Mosteller 1966. Ferguson (1989) reviews the problem history and discusses how different assumptions about this problem evolved over time.
(P2). Selecting one of the k best values. The problem is usually referred to as the Gusein-Zade stopping problem (Gusein-Zade 1966, Frank & Samuels 1980. Here q(a) = q (k) gz (a) := 1{a ≤ k}, and the problem is to maximize P{A τ,n ≤ k} with respect to τ ∈ T (R). The optimal policy was characterized in Gusein-Zade (1966). It is determined by k natural numbers 1 ≤ π 1 ≤ · · · ≤ π k and proceeds as follows: pass the first π 1 − 1 observations and among the subsequent π 1 , π 1 + 1, . . . , π 2 − 1 observations choose the first observation with relative rank one; if it does not exists then among the set of observations π 2 , π 2 + 1, . . . , π 3 − 1 choose the one of relative rank two, etc. Gusein-Zade (1966) presented dynamic programming algorithm to determine the numbers π 1 , . . . , π k and value of V * n (q (k) gz ). He also studied the limiting behavior of the numbers π 1 , . . . , π k as the problem horizon grows large, and showed that lim n→∞ V * n (q gz ) ≈ 0.574. Exact results for the case k = 3 are given in Quine & Law (1996) and for general k in Woryna (2017).
Based on general asymptotic results of Mucci (1973), Frank & Samuels (1980) computed numerically lim n→∞ V * n q (k) gz for a range of different values of k. The recent paper Dietz et al. (2011) studies some approximate policies.
(P3). Selecting the kth best alternative. In this problem q(a) = q (k) pd (a) := 1{a = k}, i.e. we want to maximize the probability of selecting the kth best candidate. The problem was explicitly solved for k = 2 by Szajowski (1982), Rose (1982a) and Vanderbei (2012); the last paper coined the name the postdoc problem for this setting. An optimal policy for k = 2 is to reject first ⌈n/2⌉ observations and then select the one which is the second best relative to this previous observation set, if it exits; otherwise the last element in the sequence is selected. The optimal value is V * n (q (2) pd ) = (n + 1)/4n if n is odd and V * n (q (2) pd ) = n/4(n − 1) if n is even. An optimal stopping rule for the case k = 3 and some results on the optimal value were reported recently in Lin et al. (2019). We are not aware of results on the optimal policy and exact computation of the optimal values for general n and k. Recently approximate policies were developed in Bruss & Louchard (2016). The problem of selecting the median value k = (n + 1)/2, where n is odd, was considered in Rose (1982b). It is shown there that lim n→∞ V * n (q Expected rank type problems. To this category we attribute problems with reward q which is not an indicator function. (P4). Minimization of the expected rank. In this problem the goal is to minimize EA τ,n with respect to τ ∈ T (R). If we put q(a) = q er (a) := −a then min τ ∈T (R) This problem was discussed heuristically by Lindley (1961) and solved by Chow et al. (1964). It was shown there that lim n→∞ min τ ∈T (R) EA τ,n = ∞ j=1 (1 + 2 j ) 1/(j+1) ≈ 3.8695.
The corresponding optimal stopping rule is given by backward induction relations. A simple suboptimal stopping rule which is close to the optimal one was proposed in Krieger & Samuel-Cahn (2009).
(P5). Minimization of the expected squared rank. Based on Chow et al. (1964), Robbins (1991) developed the optimal policy and computed the asymptotic optimal value in the problem of minimization of E[A τ,n (A τ,n + 1) · · · (A τ,n + k − 1)] with respect to τ ∈ T (R). In particular, he showed that for the optimal stopping rule τ * Robbins (1991) also discussed the problem of minimization of EA 2 τ,n over τ ∈ T (R) and mentioned that the optimal stopping rule and optimal value are unknown. As we will demonstrate below, optimal policies for any problem of this type can be easily derived, and the corresponding optimal values are straightforwardly calculated for any fixed n.

Problems with random horizon
In Problem (A1) and specific problem instances of Section 2.1 the horizon n is fixed beforehand, and optimal policies depend critically on this assumption. However, in practical situations n may be unknown. This fact motivates settings in which the horizon is assumed to be a random variable.
If the horizon is random then the selection may not have been made by the time the observation process terminates. In order to take this possibility into account, we introduce minor modifications in the definitions of the absolute and relative ranks in (1). By convention we put A t,k = 0 for t > k, and if N is a positive random variable representing the problem horizon and taking values in {1, . . . , N max } (N max can be infinite) then on the event {N = k}, k = 1, . . . , N max , we set Furthermore,R t := σ(R 1 , . . . ,R t ) denotes the σ-field induced by (R 1 , . . . ,R t ), andR := {R t , 1 ≤ t ≤ N max } is the corresponding filtration. We refer to the sequence {R t , 1 ≤ t ≤ N max } as the sequence of observed relative ranks. The general selection problem with random horizon, no-information and rank-dependent reward is formulated as follows [see Presman and Sonin (1972) and Irle (1980)].
The introduced model assigns fictitious zero value to the observed relative rankR t if the selection has not been made by the end of the problem horizon, i.e., if t > N . By assumption q(0) = 0 the reward for not selecting an observation by time N is also set to zero, though other possibilities can be considered for this value.
In principle, all problems (P1)-(P5) discussed above can be formulated and solved under the assumption that the observation horizon is random. Below we discuss the following three problem instances.
(P6). Classical secretary problem with random horizon. The classical secretary problem with random horizon N corresponds to Problem (A2) with q(a) = 1{a = 1}; it was studied in Presman and Sonin (1972). In Problem (P1) where n is fixed, the stopping region is an interval of the form {k n , . . . , n} for some integer k n . In contrast to (P1), Presman and Sonin (1972) show that for general distributions of N the optimal policy can involve "islands," i.e., the stopping region can be a union of several disjoint intervals ("islands"). The paper derives some sufficient conditions under which the stopping region is a single interval and presents specific examples satisfying these conditions. In particular, it is shown that in the case of the uniform distribution on {1, . . . , N max }, i.e., γ k = 1/N max , k = 1, . . . , N max , the stopping region is of the form {k Nmax , . . . , The characterization of optimal policies for general distributions of N is not available in the existing literature.
(P7). Selecting one of the k best values over a random horizon. This is a version of the Gusein-Zade stopping problem, Problem (P2), with random horizon. Recall that here the reward function is q (k) gz (a) = 1{a ≤ k}. To the best of our knowledge, this setting has been studied only for k = 2 and uniform distribution of N , i.e., γ k = 1/N max , k ∈ {1, . . . , N max }; see Kawai & Tamaki (2003). The cited paper derives the optimal policy and demonstrates that it is qualitatively the same as in the setting with fixed horizon. Kawai & Tamaki (2003) study asymptotics of thresholds π 1 and π 2 , and compute numerically the problem optimal value for a range of N max 's; in particular, lim Nmax→∞ V * γ (q gz ) ≈ 0.4038. Below we show how this problem can be stated and solved for general k and arbitrary distribution of N within our proposed unified framework.
(P8). Minimization of the expected rank over a random horizon. Consider a variant of Problem (P4) under the assumption that the horizon is a random variable N with known distribution. In this setting the loss (the negative reward) for stopping at time t is the absolute rank A t,N on the event {N ≥ t}; otherwise, the absolute rank of the last available observation A N,N = R N is received. We want to minimize the expected loss over all stopping rules τ ∈ T (R). This problem has been considered in Gianini-Pettitt (1979). In particular, it was shown there that if N is uniformly distributed over {1, . . . , N max } then the expected loss tends to infinity as N max → ∞. On the other hand, for distributions which are more "concentrated" around N max , the optimal value coincides asymptotically with the one for Problem (P4). Below we demonstrate that this problem can be naturally formulated and solved for general distributions of N using our proposed unified framework; the details are given in Section 5.2.3.

Multiple choice problems
The proposed framework is also applicable for some multiple choice problems both with fixed and random horizons. Below we review two settings with fixed horizon.
(P9). Maximizing the probability of selecting the best observation with k choices. Assume that one can make k selections, and the reward function equals one if the best observation belongs to the selected subset and zero otherwise. Formally, the problem is to maximize the probability P(∪ k j=1 {A τ j ,n = 1}) over stopping times τ 1 < · · · < τ k from T (R). This problem has been considered in Gilbert and Mosteller (1966) who gave numerical results for up to k = 8; see also Haggstrom (1967) for theoretical results for k = 2.
(P10). Minimization of the expected average rank. Assume that k choices are possible, and the goal is to minimize the expected average rank of the selected subset. Formally, the problem is to minimize 1 k E k j=1 A τ j ,n over stopping times τ 1 < · · · < τ k of T (R). For related results we refer to Ajtai et al. (2001), Krieger et al. (2008), Krieger et al. (2007) and Nikolaev & Sofronov (2007).

Miscellaneous problems
The proposed framework extends beyond problems with rank-dependent rewards and noinformation. The next two problem instances demonstrate such extensions.
(P11). Moser's problem with random horizon. Let {X t , t ≥ 1} be a sequence of independent identically distributed random variables with distribution G and expectation µ. Let N be a positive integer-valued random variable representing the problem horizon. We observe sequentially X 1 , X 2 , . . . and the reward for stopping at time t is the value of the observed random variable X t ; if the stopping does not occur by problem horizon N , then the reward is the last observed observation X N . Formally, we want to maximize with respect to all stopping times τ of the filtration associated with the observed values. The formulation with fixed N = n and uniformly distributed X t 's on [0, 1] corresponds to the classical problem of Moser (1956).
(P12). Bruss' Odds-Theorem. Bruss (2000) considered the following optimal stopping problem. Let Z 1 , . . . , Z n be independent Benoulli random variables with success probabilities p 1 , . . . , p n respectively. We observe Z 1 , Z 2 , . . . sequentially and want to stop at the time of the last success, i.e., the problem is to find a stopping time τ ∈ T (Z ) so as the probability P(Z τ = 1, Z τ +1 = Z τ +2 = · · · = Z n = 0) is maximized. Odds-Theorem (Bruss 2000, Theorem 1) states that it is optimal to stop at the first time instance t such that Z t = 1 and t ≥ t * := sup 1, sup k = 1, . . . , n : n j=k p j q j ≥ 1 , with q j := 1 − p j and sup{∅} = −∞. This statement has been used in various settings for finding optimal stopping policies. For example, it provides shortest self-contained solution to the classical secretary problem (Bruss 2000). For some extensions to multiple stopping problems see Matsui & Ano (2016) and references therein. We also refer to the recent work Bruss (2019) where further relevant references can be found. In what follows we will demonstrate that Bruss' Odds-Theorem can be derived using the proposed framework.

Sequential stochastic assignment problems
The unified framework we propose leverages the sequential assignment model toward the solution of the problems presented in Section 2. In this section we consider two formulations of the stochastic sequential assignment problem: the first is the classical formulation introduced by Derman, Lieberman & Ross (1972), while the second one is an extension for random horizon.

Sequential assignment problem with fixed horizon
The formulation below follows the terminology used by Derman, Lieberman & Ross (1972). Suppose that n jobs arrive sequentially in time, referring henceforth to the latter as the problem horizon. The tth job, 1 ≤ t ≤ n, is identified with a random variable Y t which is observed. The jobs must be assigned to n persons which have known "values" p 1 , . . . , p n . Exactly one job should be assigned to each person, and after the assignment the person becomes unavailable for the next jobs. If the tth job is assigned to the jth person then a reward of p j Y t is obtained. The goal is to maximize the expected total reward. Formally, assume that Y 1 , . . . , Y n are integrable independent random variables defined on probability space (Ω, F , P), and let F t be the distribution function of Y t for each t. Let Y t denote the σ-field generated by (Y 1 , . . . , Y t ): Y t = σ(Y 1 , . . . , Y t ), 1 ≤ t ≤ n. Suppose that π = (π 1 , . . . , π n ) is a permutation of {1, . . . , n} defined on (Ω, F ). We say that π is an assignment policy (or simply policy) if {π t = j} ∈ Y t for every 1 ≤ j ≤ n and 1 ≤ t ≤ n. That is, π is a policy if it is non-anticipating relative to the filtration Y = {Y t , 1 ≤ t ≤ n} so that tth job is assigned on the basis of information in Y t . Denote by Π(Y ) the set of all policies associated with the filtration Y = {Y t , 1 ≤ t ≤ n}. Now consider the following sequential assignment problem.
In the sequel the following representation will be useful n t=1 here the random variables ν j ∈ {1, . . . , n}, j = 1, . . . , n are given by the one-to-one correspondence {ν j = t} = {π t = j}, 1 ≤ t ≤ n, 1 ≤ j ≤ n. In words, ν j denotes the index of the job to which the jth person is assigned.
The structure of the optimal policy is given by the following statement.
Theorem 1 (Derman, Lieberman & Ross (1972); Albright (1972)) Consider Problem (AP1) with horizon n. There exist real numbers {a j,n } n j=0 , −∞ ≡ a 0,n ≤ a 1,n ≤ · · · ≤ a n−1,n ≤ a n,n ≡ ∞ such that on the first step, when random variable Y 1 distributed F 1 is observed, the optimal policy is π * 1 = n j=1 j1{Y 1 ∈ (a j−1,n , a j,n ]}. The numbers {a j,n } n j=1 do not depend on p 1 , . . . , p n and are determined by the following recursive relationship where −∞ · 0 and ∞ · 0 are defined to be 0. At the end of the first stage the assigned p is removed from the feasible set and the process repeats with the next observation, where the above calculation is then performed relative to the distribution F 2 and real numbers −∞ ≡ a 0,n−1 ≤ a 1,n−1 ≤ · · · ≤ a n−2,n−1 ≤ a n−1,n−1 ≡ ∞ are determined and so on. Moreover, a j,n+1 = EY ν j , ∀1 ≤ j ≤ n, i.e., a j,n+1 is the expected value of the job which is assigned to the jth person, and n j=1 p j a j,n+1 is the optimal value of the problem.
Remark 1 In order to determine an optimal policy we calculate inductively a triangular array {a j,t } t−1 j=1 for t = 2, . . . , n + 1, where F n−t+2 is used in order to compute {a j,t } t−1 j=1 . In implementation the optimal policy uses numbers a 1,n , a 2,n , a n−1,n in order to identify one value from p 1 , . . . , p n which will multiply Y 1 . Then, this value of p is excluded from n values, and numbers a 1,n−1 , a 2,n−1 , a n−2,n−1 are used for determination of the next value of p from n − 1 remaining values; this value will multiply Y 2 , and so on. At the last step the number a 1,2 is to assign one of the two remaining values of p to Y n−1 . Finally, the last remaining value of p will be assigned to Y n .

Stochastic sequential assignment problems with random horizon
In practical situations the horizon, or number of available jobs, n is often unknown. Under these circumstances the optimal policy of Derman, Lieberman & Ross (1972) is not applicable. This fact provides motivation for the setting with random number of jobs. The sequential assignment problem with random horizon was formulated and solved by Sakaguchi (1984) who derived the optimal policy using dynamic programming principles. More recently, Nikolaev & Jacobson (2010) also considered the sequential assignment problem with a random horizon. They show that the optimal solution to the problem with random horizon can be derived from the solution to an auxiliary assignment problem with dependent job sizes. Below we demonstrate that the problem with random horizon is in fact equivalent to a certain version of the sequential assignment problem with fixed horizon and independent job sizes.
The stochastic sequential assignment problem with random horizon is stated as follows.

Remark 2 (i) The probability model of Problem (AP2) postulates that the decision maker observes
vector (Ȳ 1 , . . . ,Ȳ Nmax ) that is generated as follows. Given random variable N and a sequence {Y t , t ≥ 1}, independent of N , the decision maker is presented with the with respective weights γ 1 , γ 2 , . . . , γ Nmax .
(ii) The definition of the sequence {Ȳ t , t ≥ 1} and condition P(Y t = 0) = 0 for all t imply that the first observed zero value ofȲ t designates termination of the assignment process. In particular,Ȳ t = 0 implies thatȲ s = 0 for all s ≥ t.
In the following statement we show that Problem (AP2) is equivalent to a version of Problem (AP1), the standard sequential assignment problem with fixed horizon and independent job sizes.
Theorem 2 The optimal value in Problem (AP2) coincides with the optimal value in Problem (AP1) associated with fixed horizon n = N max and independent job sizes Y t Nmax k=t γ k . The optimal policy in Problem (AP2) follows the one in Problem (AP1) with fixed horizon n = N max and independent job sizes Y t Nmax k=t γ k until the first zero value ofȲ t is observed; this indicates termination of the assignment process.
Proof : With the introduced notation for any π ∈ Π(Ȳ ) It follows from (5) that the expected total reward S γ (π) is fully determined by the values of p πt on events {N ≥ t}, t = 1, . . . , N max only; the value of p πt on {N < t} is irrelevant as the ensuing reward is equal to zero. Note that π t isȲ t -measurable, i.e., π t = π t (Ȳ 1 , . . . ,Ȳ t ) for any t = 1, . . . , N max . However, by definition, This implies that in (5) the decision variable π t can be taken to be Y t -measurable. It follows that where the last equality follows from independence of N and Y t . Thus, which shows that the optimal value coincides with the one in the assignment problem with fixed horizon n = N max and independent job sizes Y t Nmax k=t γ k . As long as the assignment process proceeds, the optimal policy follows the one in said problem with fixed horizon n = N max and independent job sizes Y t Nmax k=t γ k . The first observed zero value ofȲ t indicates termination of the assignment process due to horizon randomness. Sakaguchi (1984); however, Sakaguchi (1984) does not mention this. In contrast, Nikolaev & Jacobson (2010) develop optimal policy by reduction of the problem to an auxiliary one with dependent job sizes. As Theorem 2 shows, this is not necessary: the problem with random number of jobs is equivalent to the standard sequential assignment problem with independent job sizes, and it is solved by the standard procedure of Derman, Lieberman & Ross (1972).

Remark 3 To the best of our knowledge, the relation between Problems (AP2) and (AP1) established in Theorem 2 is new. In fact, this relationship is implicit in the optimal policy derived in
Remark 4 In Theorem 2 we assume that N max is finite. Under suitable assumptions on the weights {p j } and jobs sizes {Y t } one can construct ǫ-optimal policies for the problem with infinite N max . However, we do not pursue this direction here. 4 A unified approach for solving sequential selection problems 4.1 An auxiliary optimal stopping problem Consider the following auxiliary problem of optimal stopping.
Problem (B) is a specific case of the stochastic sequential assignment problem of Derman, Lieberman & Ross (1972), and Theorem 1 has immediate implications for Problem (B). The following statement is a straightforward consequence of Theorem 1.

Reduction to the auxiliary stopping problem
Problems (A1) and (A2) of Section 2 can be reduced to the optimal stopping of a sequence of independent random variables [Problem (B)]. In order to demonstrate this relationship we use well known properties of the relative and absolute ranks defined in (1). These properties are briefly recalled in the next paragraph; for details see, e.g., Gnedin & Krengel (1996).
Let A n := (A 1,n , . . . , A n,n ), and let A n denote then set of all permutations of {1, . . . , n}; then P(A n = A) = 1/n! for all A ∈ A n and all n. The random variables {R t , t ≥ 1} are independent, and P(R t = r) = 1/t for all r = 1, . . . , t. For any n and t = 1, . . . , n and Now we are in a position to establish a relationship between Problems (A1) and (B).
It follows from (9) By independence of the relative ranks, {Y t } is a sequence of independent random variables. The relationship between stopping problems (A1) and (B) is given in the next theorem.
Theorem 3 The optimal stopping rule τ * solving Problem (B) with random variables {Y t } given in (10)-(11) also solves Problem (A1): Proof : First we note that for any stopping rule τ ∈ T (R) one has Eq(A τ,n ) = EY τ , where where we have used the fact that {τ = k} ∈ R k . This implies that max τ ∈T (R) Eq(A τ,n ) = max τ ∈T (R) EY τ . To prove the theorem it suffices to show only that Because R 1 , . . . , R n are independent random variables, and Y t = I t,n (R t ), ∀t we have that for any s, t ∈ {1, . . . , n} with s < t The statement (12) follows from (13), (14) and Theorem 5.3 of Chow et al. (1971). In fact, (12) is a consequence of the well known fact that randomization does not increase rewards in stopping problems (Chow et al. 1971, Chapter 5). This concludes the proof.
It follows from Theorem 3 that the optimal stopping rule in Problem (A1) is given by Corollary 1 with random variables {Y t } defined by (11). To implement the rule we need to compute the distributions {F t } of the random variables {Y t } and to apply formulas (6) and (7).
Random horizon. Next, we establish a correspondence between Problems (A2) and (B). Let where I t,k (·) is given in (10), and γ k = P(N = k). Below in the proof of Theorem 4 we show that Define also Theorem 4 (i) Let N max < ∞; then the optimal stopping rule τ * solving Problem (B) with fixed horizon N max and random variables {Y t } given in (15)-(16) provides the optimal solution to Problem (A2): Let ǫ > 0 be arbitrary; then there existsÑ max =Ñ max (ǫ) such that for any stopping rule τ ∈ T (R) one has In particular, the optimal stopping rule τ * solving Problem (B) with fixed horizoñ N max =Ñ max (ǫ) and {Y t } given (15)-(16) is an ǫ-optimal stopping rule for Problem (A2): Proof : (i). In Problem (A2) the reward for stopping at time t is Q t = q(A t,N )1{N ≥ t}, and the objective is to maximize EQ τ with respect to stopping times τ of filtrationR [see (3)]. First, we argue that as long as the decision process does not terminate before time t, we can restrict ourselves to stopping times τ adapted to filtration R. This is a consequence of the fact that performance V γ (q; τ ) = EQ τ of any stopping rule τ ∈ T (R) is fully determined by its probabilistic properties on the event {τ ≤ N } only. Indeed, write However, on the event {N ≥ t}, when the decision process is at time t, we haveR 1 = R 1 , . . . ,R t = R t so that in fact ϕ t = ϕ t (R 1 , . . . , R t ). Thus, in view of the structure of the reward function, at any time instance t at which the decision is made we should consider stopping rules adapted to R only, i.e., τ ∈ T (R). This implies by conditioning where Y t = Nmax k=t γ k I t,k (R t ), t = 1, . . . , N max [cf. (16)]. Here the second equality follows from {τ = t} ∈ R t on {N ≥ t}, while the third equality holds by independence of N and {R t , t ≥ 1}. The remainder of the proof proceeds along the lines of the proof of Theorem 3.
(ii). In view of the proof of (i) we can restrict ourselves with with the stopping rules τ ∈ T (R). LetÑ max =Ñ max (ǫ) be the minimal integer number such that The existence ofÑ max (ǫ) follows from (17). By (20) and (21), for any stopping rule τ ∈ T (R) we have V γ (q; τ ) = E ∞ k=τ γ k I τ,k (R τ ), and This implies (18). In order to prove (19) we note that ifτ is the optimal stopping rule in Problem (A2) then by (18) and definition of τ * V γ (q;τ ) = V * γ (q) ≤ WÑ max (τ ) + ǫ ≤ WÑ max (τ * ) + ǫ, which proves the upper bound in (19). On the other hand, in view of (18) This concludes the proof. (17) imposes restrictions on the tail of the distribution of N . It can be easily verified in any concrete setting; for details see Section 5. (10)-(11) and (15)-(16) respectively. The latter problem is solved by the recursive procedure given in Corollary 1.

Specification of the optimal stopping rule for Problems (A1) and (A2)
Now, using Theorems 3 and 4, we specialize the result of Corollary 1 for solution of Problems (A1) and (A2). For this purpose we require the following notation: U t (r) := I t,n (r), Problem (A1), J t (r), Problem (A2).
Note that in Problem (A2) we put ν = N max for distributions with the finite right endpoint N max < ∞; otherwise ν =Ñ max , whereÑ max is defined in the proof of Theorem 4. With this notation Problem (B) is associated with independent random variables Y t = U t (R t ) for t = 1, . . . , ν.
Expectation of stopping times. As we have already mentioned, in the considered problems the optimal stopping rule belongs to the class of memoryless threshold policies. This facilitates derivation of the distributions of the corresponding stopping times, and calculation of their probabilistic characteristics. One of the important characteristics is the expected time elapsed before stopping. In problems with fixed horizon ν = n it is given by the following formula where {F t } and {b t } are defined in (23) and (24)-(25).
In the problems where the horizon N is random, the time until stopping is τ * ∧ N . In this case where and

Implementation
In this section we present an efficient algorithm implementing the optimal stopping rule described earlier. In order to implement (24)-(25) we need to find the sets {y t (j), j = 1, . . . , ℓ t } in which random variables Y t , t = 1, . . . , ν take values, and to compute the corresponding probabilities {f t (j), j = 1, . . . , ℓ t }.
The following algorithm implements the optimal policy.

Solution of the sequential selection problems
In this section we revisit problems (P1)-(P12) discussed earlier from the viewpoint of the proposed framework. We refer to Section 2 for detailed description of these problems and related literature.

Problems with fixed horizon
First we consider problems (P1)-(P5) with fixed horizon; in all these problems ν = n.
The random variable Y t = (t/n)1{R t = 1} = P(A t,n = 1|R t ) takes two different values y t (1) = t/n, y t (2) = 0 with probabilities f t (1) = 1/t and f t (2) = 1 − (1/t). Then Step 4 of the Algorithm 1 takes the form: The optimal policy is to stop the first time instance t such that Y t > b n−t+1 , i.e., which coincides with well known results.

Selecting one of k best alternatives
This setting is stated as Problem (P2) in Section 2. In this problem q(a) = 1{a ≤ k} with some k ≤ n. We will assume here that k ≥ 2; the case k = 1 was treated above. We have , 1 ≤ r ≤ k, t = 1, . . . , n.
Using this formula together with the recursive relationship (30) we can determine the structure of vector U t := (U t (1), . . . , U t (t)) for each t = 1, . . . , n, and compute {y t (j)} and {f t (j)}. Specifically, the following facts are easily verified.
(a) Let n − k + 2 ≤ t ≤ n. Here vector U t has the following structure: the first t + k − n components are ones, the next n − t components are distinct numbers in (0, 1) which are given in (33), and the last t − k components are zeros. Formally, if n − k + 2 ≤ t ≤ n − 1 and k > 2 then we have Note that if k = 2 the regime reduces to t = n; therefore if k = 2 or t = n then U n is given by (34). These facts imply the following expressions for {y t (j)} and {f t (j)}: and If t = n then ℓ t = 2, y n (1) = 1, y n (2) = 0, f n (1) = k/n, f n (2) = 1 − k/n.
In our implementation we compute U t (j) for t = 1, . . . , n and j = 1, . . . , t using (34) and (30). Then {y t (j)}, {f t (j)} and the sequence {b t } are easily calculated from (35)-(38) and (32) respectively. Table 1 presents exact values of the optimal probability P (n, k) = b n+1 and the expected time until stopping E(n, k) = E(τ * ) normalized by n for different values of k and n. We are not aware of works that report exact results for general k and n as presented in Table 1. These results should be compared to the asymptotic values of 1 − P (n, k) as n → ∞ computed in Frank & Samuels (1980 , Table 1) for a range of values of k. The comparison shows that the approximate values in Frank & Samuels (1980) are in a good agreement with the exact values of Table 1. For instance, for n = 100 the approximate values coincide with the exact ones up to the third digit after the decimal point.
It is worth noting that the optimal policy developed by Gusein-Zade (1966) is expressed in terms of of relative ranks. In contrast, our policy is expressed via the random variables n k P (n, k) E(n, k)/n n k P (n, k) E(n, k)/n n k P (n, k) E(n, k)/n  Table 1: Optimal probabilities P (n, k) and the normalized expected time elapsed until stopping E(n, k)/n for selecting one of the k best values.
Y t = U t (R t ), and it is memoryless threshold in terms of {Y t }. This allows to efficiently compute the distribution of the optimal stopping time, and, in particular, the expected time until stopping. The value of E(n, k) is computed using formula (26) combined with (22) and (33)-(38). The presented numbers agree with asymptotic results of Yeo (1997) proved for k = 2, 3 and 5.

Selecting the k-th best alternative
This setting is discussed in Section 2 as problem (P3). In this problem q(a) = 1{a = k}, k ≥ 2. Similarly to the Gusein-Zade stopping problem, here we have three different regimes that define explicit relations for {U t (r)}, {y t (j)} and {f t (j)}.
(a) Let 1 ≤ t ≤ k; then All values of U t (1), . . . , U t (t) are positive and distinct. Thus The set {U t (1), . . . , U t (t)} contains k + 1 distinct values: U t (1), . . . , U t (k) are positive distinct, and U t (k + 1) = · · · = U t (t) = 0. Therefore, (c) Let n − k + 2 ≤ t ≤ n; then the sequence {U t (r)} takes the following values k−1 r−1 n−k t−r n t , r = t − n + k, . . . , k, 0, r = k + 1, . . . , t. n k P (n, k) E(n, k)/n n k P (n, k) E(n, k)/n n k P (n, k) E(n, k)/n  Table 2: Optimal probabilities P (n, k) and the normalized expected time elapsed until stopping E(n, k)/n for selecting the k-th best alternative computed using (39)-(42). Therefore, and, correspondingly, Table 2 presents optimal probabilities of selecting kth best alternative for a range of k and n. In the specific case of k = 2 Rose (1982a) showed that the optimal stopping rule is and the optimal probability is P (n, 2) = n+1 4n if n is odd. The results for k = 2 in Table 2 are in full agreement with this formula. The table also presents numerical computation of optimal values in the problem of selecting the median value; see Rose (1982b) who proved that lim n→∞ V * n (q ((n+1)/2) pd ) = 0.

Expected rank type problems
In this section we consider problems (P4) and (P5) discussed in Section 2.
Expected rank minimization. Following (2) we consider the problem of minimization of Eq(A τ,n ), where q(a) = −a. It is well known that E A t,n |R t = r = (n + 1)r/(t + 1); therefore for t = 1, . . . , n In this setting Substitution to (25) yields b 1 = −∞, b 2 = − 1 2 (n + 1), Straightforward calculation shows that (43) takes form where j t := ⌊−b t n−t+2 n+1 ⌋. The optimal policy is to stop the first time instance t such that Y t > b n−t+1 , i.e., Then according to (2) the optimal value of the problem equals to −b n+1 . We note that the derived recursive procedure coincides with the one of Chow et al. (1964), and the calculation for n = 10 6 yields the optimal value 3.86945 . . .
Expected squared rank minimization. This problem was posed in Robbins (1991), and to the best of our knowledge, it was not solved to date. We show that the proposed unified framework can be used in order to compute efficiently the optimal policy and its value.

Problems with random horizon
This section demonstrates how to apply the proposed framework for solution of selection problems with a random horizon. In these problems we apply Algorithm 1 with ν being the maximal horizon length N max , provided that N max is finite, or with sufficiently large horizonÑ max if N max is infinite. Moreover, U t (r) = J t (r), where {J t (r)} is given by (15).
Recall that in all problems with random horizon the selection may not be made by the time the observation process terminates. However, Theorems 2 and 4 show that as long as the observation process proceeds, the optimal stopping rule is identical to the one in the setting with fixed horizon N max and random variables Y t := U t (R t ), t = 1, . . . , N max , where U t (·) is defined in (31). In the subsequent discussion of specific problem instances with random horizon we use this fact without further mention.

Classical secretary problem with random horizon
This is Problem (P5) of Section 2 where q(a) = 1{a = 1}; therefore Note that if N max = ∞ then condition (17) is trivially fulfilled since  Table 4: Optimal values V * (N max ) := P{A τ * ,N = 1, τ * ≤ N } for a uniformly distributed horizon length N , normalized expected times until stopping E * (N max ) and E * (n) for random and fixed horizons.
The random variables Y t = U t (R t ) = 1(R t = 1) t ν k=t γ k /k take two different values y t (1) = t ν k=t γ k /k and y t (2) = 0 with corresponding probabilities f t (1) = 1/t and f t (2) = 1−1/t. Substituting these values in (32) we obtain b 1 = −∞, b 2 = γ ν /ν, and for t = 2, . . . , ν The optimal policy is to stop at time t if Y t > b ν−t+1 , i.e., Presman and Sonin (1972) investigated the structure of optimal stopping rules and showed that, depending on the distribution of N , the stopping region can involve several "islands," i.e., it can be a union of disjoint subsets of {1, . . . , N max }. Note that (46) determines the stopping region automatically. Indeed, it is optimal to stop only at those t's that satisfy t ν k=t γ k /k > b ν−t+1 . We apply the stopping rule (45)-(46) for two examples of distributions of N . In the first example N is assumed to be uniformly distributed on the set {1, . . . , N max }. As it is known, in this case the optimal stopping region has only one "island." The second example illustrates a setting in which the stopping region has more than one "island." 1. Uniform distribution. In this case ν = N max , γ k = 1/N max , k = 1, . . . , N max . It was shown in Presman and Sonin (1972) that the optimal stopping region in this problem has one "island," i.e., the optimal policy selects the first best member appearing in the range {k n , . . . , n}. The recursive relation (45) with γ k = 1/N max , k = 1, . . . , N max yields the optimal values V * (N max ) := P{A τ * ,N = 1, τ * ≤ N } given in Table 4. The second line of Table 4 presents the normalized expected time until stopping E * (N max ) := E(τ * ∧N max )/N max computed using (27), (28) and (29). For comparison, we also give the normalized expected time elapsed until stopping E * (n) := Eτ * /n for the optimal stopping rule in the classical secretary problem (see the third line of the table). These numbers are calculated using (26). As expected, E * (N max ) is significantly smaller than E * (n); the optimal rule is more cautious when the horizon is random.
It was also shown in Presman and Sonin (1972) that lim Nmax→∞ V * (N max ) = 2e −2 = 0.27067 . . .. Note that the numbers in Table 4 are in full agreement with these results. Figure 1(a) displays the sequences {b Nmax−t+1 } and t Nmax k=t γ k /k for the uniform distribution for N max = 100. Note the stopping region is the set of t's where the blue curve is above the red curve. Thus, there is only one "island" in this case.

Mixture of two zero-inflated binomial distributions.
Here we assume that the distribution G N of N is the mixture: , i = 1, 2, and X 1 ∼ Bin(50, 0.2), X 2 ∼ Bin(100, 0.8). In other words, for k = 1, . . . , 100 The optimal stopping rule is given by (45)-(46) with {γ k } indicated above. Figure 1(b) displays the graphs of the sequences {b Nmax−t+1 } and t Nmax k=t γ k /k . It is clearly seen that in this setting the stopping region is a union of two disjoint sets of subsequent integer numbers. These sets correspond to the indices where the graph of t Nmax k=t γ k /k is above the graph of {b Nmax−t+1 }. The stopping region can be easily identified from given formulas.

Selecting one of k best alternatives with random horizon
This is Problem (P6) of Section 2; here q(a) = 1{a ≤ k}. Algorithm 1 is implemented similarly to Problem (P2). First, values I t,k (r), k = 1, . . . , N max , t = 1, . . . , k, r = 1, . . . , t are calculated using the recursive formula (30) along with the boundary condition (34). Then, using (31), we compute U t (1), . . . , U t (t) for t = 1, . . . , N max , and find the distinct values y t (1), . . . , y t (ℓ t ) of the vector (U t (1), . . . , U t (t)) for all t = 1, . . . , N max . Finally, the sequence {b t } is found from (32). The optimal policy is to stop the first time instance t such that Y t = U t (R t ) > b n−t+1 provided that the observed relative rank is different from zero; otherwise, the selection process terminates by the problem horizon N . The optimal value of the problem is P (N max , k) := P{A τ * ,N ≤ k, τ * ≤ N } = b Nmax+1 . We apply this algorithm for two different examples: a uniform horizon distribution, and a U-shaped distribution. The second example demonstrates that the optimal stopping region can have "islands" in the terminology of Presman and Sonin (1972).
exact values of the optimal probability P (N max , k). For k = 1 the values of P (N max , 1) are in agreement with the values of Table 4 and also with the asymptotic value obtained by Presman and Sonin (1972), lim Nmax→∞ P (N max , 1) = 2e −2 = 0.27067 . . .. For k = 2 the values of P (N max , 2) are in the agreement with the values of Table 1 in Kawai & Tamaki (2003) and also with the asymptotic value obtained there, lim Nmax→∞ P (N max , 2) ≈ 0.4038.

U-shaped distribution.
In this example we let N max = 100, and consider the problem of selecting one of three best alternatives, i.e., k = 3. The optimal value in this problem is P (100, 3) = 0.39711. Figure 2 displays the graphs of sequences {b Nmax−t−1 } and {U t (r)}, r = 1, 2, 3 from which the form of the stopping region is easily inferred.
Recall that the optimal policy stops when Y t = U t (R t ) > b Nmax−t−1 provided that the decision process arrives at time t. Therefore the stopping region corresponds to the set of time instances for which the graphs of {U t (r)}, r = 1, 2, 3 are above the graph of {b Nmax−t+1 }. In particular, Figure 2 shows that the optimal stopping policy is the following. If the decision process does not terminate due to horizon randomness then: pass the first four observations t = 1, . . .

Expected rank minimization over random horizon
In this setting [Problem (P8) of Section 2] we would like to minimize the expected absolute rank on the event that the stopping occurs before N ; otherwise we receive the absolute rank of the last available observation, A N,N = R N . Formally, the corresponding stopping problem is Thus, letting q(A t,k ) = R k − A t,k for t ≤ k we note that I t,k (r) = E q(A t,k ) | R 1 = r 1 , . . . , R t−1 = r t−1 , R t = r = 1 2 (k + 1) − k + 1 t + 1 r and therefore If N max = ∞ then we require that EN < ∞; this ensures condition (17).
The recursion for computation of the optimal value is obtained by substitution of these  Table 6: Optimal values V * (N max ) computed using (48).
We illustrate these results in Table 6. The first row of the table, α = 1, corresponds to the uniform distribution where γ k = 1/N max , k = 1, . . . , N max , while for general α > 0 see Gianini-Pettitt (1979). It is seen from the table that in the case α = 3 the optimal value approaches the universal limit of Chow et al. (1964) as N max goes to infinity. For α = 2 the formula (48) yields the optimal value 4.2444 . . .; this complements the result of Gianini-Pettitt (1979) on boundedness of the optimal value.

Multiple choice problems
The existing literature treats sequential multiple choice problems as problems of multiple stopping. However, if the reward function has an additive structure, and the involved random variables are independent then these problems can be reformulated in terms of the sequential assignment problem of Section 3. Under these circumstances the results of Derman, Lieberman & Ross (1972) are directly applicable and can be used in order to construct optimal selection rules. We illustrate this approach in the next two examples.

Maximizing the probability of selecting the best observation with k choices
This setting was first considered by Gilbert and Mosteller (1966), and it is discussed in Section 2 as Problem (P9). The goal is to maximize the probability for selecting the best observation with k choices, i.e., to maximize with respect to the stopping times τ (k) = (τ 1 , . . . , τ k ), τ 1 < · · · < τ k of the filtration R. This problem is equivalent to the following version of the sequential assignment problem (AP1) [see Section 3].
The relationship between sequential assignment and multiple choice problems is evident: if a policy π assigns p πt = 1 to the observation Y t then the corresponding tth observation is selected, i.e., events {p πt = 1} and ∪ k j=1 {τ j = t} are equivalent. The optimal policy for the above assignment problem is characterized by Theorem 1. Specifically, for t = 1, . . . , n let p t 1 ≤ p t 2 ≤ · · · ≤ p t n−t+1 be the subset of the coefficients {p 1 , . . . , p n } that are left unassigned at time t. Let s t = n−t+1 i=1 p t i denote the number of observations to be selected (unassigned coefficients p's equal to 1). The optimal policy π * at time t partitions the real line by numbers −∞ = a 0,n−t+1 ≤ a 1,n−t+1 ≤ · · · ≤ a n−t,n−t+1 ≤ a n−t+1,n−t+1 = ∞, and prescribes to select the tth observation if Y t > a n−t+1−st,n−t+1 . In words, the last inequality means that the observation is selected if Y t is greater than the s t -th largest number among the numbers a 1,n−t+1 , a 2,n−t+1 , . . . , a n−t,n−t+1 . These numbers are given by the following formulas: a 0,n−t+1 = −∞, a n−t+1,n−t+1 = ∞, and for j = 1, . . . , n − t a j,n−t+1 = a j,n−t a j−1,n−t zdF t+1 (z) + a j−1,n−t F t+1 (a j−1,n−t ) + a j,n−t (1 − F t+1 (a j,n−t )), where F t is the distribution function of Y t . The optimal value of the problem is S * (k) = S(π * ; k) = k j=1 a n−j+1,n+1 .  Table 7: Optimal values S * (k) in the problem of maximizing the probability of selecting the best option with k choices. The table is computed using (50) and (49) for n = 10 4 .
The structure of the optimal policy allows to compute distribution of the time required for the subset selection. As an illustration, we consider computation of the expected time required for selecting two options (k = 2). According to the optimal policy the first choice is made at time τ 1 := min{t = 1, . . . , n : Y t > a n−t−1,n−t+1 }, while the second choice occurs at time τ 2 := min{t > τ 1 : Y t > a n−t,n−t+1 }. Then the expected time to the subset selection is where Eτ 1 = 1 + n−1 j=1 j t=1 F t (a n−t−1,n−1+1 ) F t (a n−t,n−1+1 )P(τ 1 = j) F t (a n−t,n−1+1 ) 1 − F j (a n−j−1,n−j+1 ) j−1 t=1 F t (a n−t−1,n−t+1 ). (53) These formulas are clearly computationally amenable and easy to code on a computer.  Table 8: The optimal value S * (k) in the problem of minimization of the expected average rank with k choices for n = 10 5 .

Minimization of the expected average rank with k choices
In this problem that it is discussed in Section 2 as Problem (P10) we want to minimize the expected average rank of the k selected observations: where τ (k) = (τ 1 , . . . , τ k ), τ 1 < · · · < τ k are stopping times of filtration R. This setting is equivalent to the following sequential assignment problem.
The optimal value S * (k) of the problem is again given by (49). Table 8 presents S * (k) for n = 10 5 and different values of k. It worth noting that k = 1 corresponds to the standard problem of expected rank minimization [Problem (P4)] with well known asymptotics S * (k) ≈ 3.8695 . . . as n goes to infinity. Using formulas (51), (52) and (53) we also computed expected time required for k = 2 selections when n = 10 3 : Eτ 1 ≈ 396.25983 and Eτ 2 ≈ 610.54822. Such performance metrics were not established so far and our approach illustrates the simplicity with which this can be done.

Miscellaneous problems
The next two examples illustrate applicability of the proposed framework to some other problems of optimal stopping.

Moser's problem with random horizon
This is Problem (P11) of Section 2. The stopping problem is and for any stopping time τ ∈ T (X ) Thus, the original stopping problem is equivalent to the problem of stopping the sequence of independent random variables Y t = (X t − µ) Nmax k=t γ k , t = 1, . . . , N max , and the optimal value is The distribution of Y t is F t (z) = G(µ + z σt ), t = 1, . . . , N max , where σ t := Nmax k=t γ k . Then applying Corollary 1 we obtain that the optimal stopping rule is given by In particular, if G is the uniform [0, 1] distribution then straightforward calculation yields: b 2 = 0 and The optimal value of the problem is V * (N max ) = b Nmax+1 + 1 2 . It is worth noting that the case of γ k = 0 for all k = 1, . . . , N max − 1 and γ Nmax = 1 corresponds to the original Moser's problem with fixed horizon N max . In this case σ t = 1 for all t, and the above recursive relationship coincides with the one in Moser (1956) which
Taking into account that u n+1 = b n+1 ( n j=1 q j ) −1 we finally obtain the optimal value of the problem: These results coincide with the statement of Theorem 1 in Bruss (2000).

Concluding remarks
We close this paper with several remarks.
1. In this paper we show that numerous problems of sequential selection can be reduced to the problem of stopping a sequence of independent random variables with carefully specified distribution functions. In terms of computational complexity, we cannot assert that in all cases our approach leads to a more efficient algorithm than a dynamic programming recursion tailored for a specific problem instance. However, in contrast to the latter, in many cases of interest we are able to derive explicit recursive relationships that can be easily implemented; see, e.g., Problem (P5) that has not been solved to date, or Problems (P6) and (P7) for which our approach provides explicit expressions for computation of optimal policies under arbitrary distribution of the horizon length. The conditioning argument leads to rules expressed in terms of "sufficient statistics"; such rules are very natural, simple, and easy to interpret.
2. The proposed framework is applicable to sequential selection problems that can be reduced to settings with independent observations and additive reward function. In addition, it is required that the number of selections to be made is fixed and does not depend on the observations. As the paper demonstrates, this class is rather broad. In particular, it includes selection problems with no-information, rank-dependent rewards and fixed or random horizon. The framework is also applicable to selection problems with full information when the random variables {X t } are observable, and the reward for stopping at time t is a function of the current observation X t only. It is worth noting that in all these problems the optimal policy is of the memoryless threshold type. In addition, we demonstrate that multiple choice problems with fixed and random horizon and additive reward, as well as sequential assignment problems with independent job sizes and random horizon, are also covered by the proposed framework. In particular, variants of problems (P9), (P10) and (P12) with random horizon can also be solved using the proposed approach.
3. Although the approach holds for a broad class of sequential selection problems, there are settings that do not belong to the indicated class. For instance, settings with rankdependent reward and full information as in Gilbert and Mosteller (1966, Section 3) and Gnedin (2007) cannot be reduced to optimal stopping of a sequence of independent random variables. A prominent example of such a setting is the celebrated Robbins' problem of minimizing the expected rank on the basis of full information. This problem is still open, and only bounds on the asymptotic optimal value are available in the literature. Remarkably, Bruss & Ferguson (1996) show that no memoryless threshold rule can be optimal in this setting, and the optimal stopping rule must depend on the entire history.
4. The proposed approach is not applicable to settings where the number of selections is not fixed and depends on the observations. This class includes problems of maximizing the number of selections subject to some constraints; for representative publications in this direction we refer, e.g., to Samuels & Steele (1981), Coffman et al, (1987), Gnedin (1999), Arlotto et al. (2015) and references therein. Another example is the multiple choice problem with zero-one reward; see, e.g., Rose (1982a) and Vanderbei (1980) where the problem of maximizing the probability of selecting the k best alternatives was considered. The fact that the results of Derman, Lieberman & Ross (1972) are not applicable to the latter problem was already observed by Rose (1982a) who mentioned this explicitly.