Home » 国际竞赛 » Details

USACO 2020 January Contest, Platinum Problem 2. Non-Decreasing Subsequences Return to Problem List

Category: 国际竞赛, 计算机国际竞赛 Date: 2022年7月1日下午6:27

USACO 2020 January Contest, Platinum Problem 2. Non-Decreasing Subsequences Return to Problem List

Bessie was recently taking a USACO contest and encountered the following problem. Of course, Bessie knows how to solve it. But do you?

Consider a sequence $A_{1}, A_{2}, \dots, A_{N}$ of length $N$ $(1 \leq N \leq 5 \cdot 10^{4})$ consisting solely of integers in the range $1 \dots K$ $(1 \leq K \leq 20) .$ You are given $Q$ ( $1 \leq Q \leq 2 \cdot 10^{5}$ ) queries of the form $[L_{i}, R_{i}]$ $(1 \leq L_{i} \leq R_{i} \leq N) .$ For each query, compute the number of non-decreasing subsequences of $A_{L_{i}}, A_{L_{i} + 1} \dots, A_{R_{i}}$ mod $10^{9} + 7$ .

A non-decreasing subsequence of $A_{L}, \dots, A_{R}$ is a collection of indices $(j_{1}, j_{2}, \dots, j_{x})$ such that $L \leq j_{1} < j_{2} < \dots < j_{x} \leq R$ and $A_{j_{1}} \leq A_{j_{2}} \leq \dots \leq A_{j_{x}} .$ Make sure to consider the empty subsequence!

SCORING:

Test cases 2-3 satisfy $N \leq 1000$ .
Test cases 4-6 satisfy $K \leq 5.$
Test cases 7-9 satisfy $Q \leq 10^{5} .$
Test cases 10-12 satisfy no additional constraints.

INPUT FORMAT (file nondec.in):

The first line contains two space-separated integers $N$ and $K$ .The second line contains $N$ space-separated integers $A_{1}, A_{2}, \dots, A_{N}$ .

The third line contains a single integer $Q .$

The next $Q$ lines each contain two space-separated integers $L_{i}$ and $R_{i} .$

OUTPUT FORMAT (file nondec.out):

For each query $[L_{i}, R_{i}],$ you should print the number of non-decreasing subsequences of $A_{L_{i}}, A_{L_{i} + 1} \dots, A_{R_{i}}$ mod $10^{9} + 7$ on a new line.

SAMPLE INPUT:

SAMPLE OUTPUT:

3
4
20

For the first query, the non-decreasing subsequences are $(), (2),$ and $(3) .$ $(2, 3)$ is not a non-decreasing subsequence because $A_{2} ≰ A_{3} .$

For the second query, the non-decreasing subsequences are $()$ , $(4)$ , $(5)$ , and $(4, 5)$ .

Problem credits: Benjamin Qi

<h3>USACO 2020 January Contest, Platinum Problem 2. Non-Decreasing Subsequences Return to Problem List 题解(翰林国际教育提供，仅供参考)</h3>
<p style="text-align: center;">题解请<a href="/register" target="_blank" rel="noopener">注册</a>或<a href="/login" target="_blank" rel="noopener">登录</a>查看</p>
[/hide]

(Analysis by Benjamin Qi)

Let $M O D = 10^{9} + 7.$ General optimization tips:

Declare $M O D$ as const.
Avoid using % when adding or subtracting two integers modulo $M O D$ .
Regarding the matrices mentioned below, use 2D arrays of fixed size (rather than a vector of vectors in C++).
Don't iterate over matrix entries that must equal zero (those below the main diagonal).

It also helps to declare a separate class (or struct in C++) to take care of modular arithmetic operations.

For the sake of convenience, we'll assume that all numbers are in $[0, K)$ rather than $[1, K] .$ Also note that later sections use variables referenced in previous ones (so read in order).

Subtask 1:

We can compute the answer for every pair $(L, R)$ satisfying $1 \leq L \leq R \leq N$ in $O (N^{2} K)$ time by trying each index of the sequence as $L$ , setting $R = L$ , and then repeatedly incrementing $R .$ We should create an array $t o t$ of size $K$ which stores the number of non-decreasing subsequences which have last element $i$ for all $0 \leq i < K$ and update it appropriately after adding each element of the sequence. (Consider the empty subsequence as having last element $0.$ ) After this, we answer each of the $Q$ queries in $O (1)$ time.

SegTree (subtasks 2,3):

Note that adding an element $x$ to the end of the contiguous subsequence $[L, R]$ that we are currently considering is equivalent to setting $t o t$ equal to $t o t \cdot M_{x}$ for a $K \times K$ matrix $M_{x}$ , where we treat $t o t$ as a $1 \times K$ matrix. For example, when $K = 5$ ,

M 3 = ⎡⎣⎢⎢⎢⎢⎢⎢ 1000001000001001112000001 ⎤⎦⎥⎥⎥⎥⎥⎥,

which satisfies

[c 0 c 1 c 2 c 3 c 4] \cdot M 3 = [c 0 c 1 c 2 c 0 + c 1 + c 2 + 2 c 3 c 4] .

In other words, if we add 3 to the end of the sequence, the number of subsequences ending with 3 increases by $c_{0} + c_{1} + c_{2} + c_{3}$ while the number of subsequences ending with every other number remains the same.

This inspires us to build a segment tree. If a vertex represents the interval $[L, R],$ then we should store the matrix $M = M_{A_{L}} \cdot M_{A_{L + 1}} \dots M_{A_{R}} .$ We can multiply two such matrices in $O (K^{3}) .$ Thus, we can build this segment tree in $O (N K^{3}) .$ We can query this segment tree in $O (K^{3} \log N)$ by considering the matrices for the $O (\log N)$ segments covering $[L, R]$ in order and multiplying them.

The time complexity of this approach is $O ((N + Q \log N) K^{3}),$ which may or may not pass subtask 2. Of course, it is possible to speed up both build and query.

Regarding query, we only need to store the entries of the first row of the product. So we're essentially multiplying a $1 \times K$ matrix with a $K \times K$ matrix rather than two $K \times K$ matrices. Thus, each query runs in $O (K^{2} \log N)$ time. This passes subtask 2.
Regarding build, we can store the matrix only for intervals of length at least a certain length, say $K .$ Then for each interval of lesser length, we can just add each of the numbers manually in $O (K)$ time each, so the complexity of query is not affected. The number of $O (K^{3})$ multiplications is reduced by a factor of $K,$ bringing the complexity of build to $O (N K^{2}) .$

Both of these optimizations combined may or may not pass subtask 3. I'm not sure whether it is possible to earn full points with this method.

Divide and Conquer (full points):

The segment tree solution would allow updates to the sequence as well. However, there is really no reason to use a segment tree on an array that remains constant.

In fact, given an array $b_{1}, b_{2}, \dots, b_{N}$ and an associative operation $\oplus$ that runs in $O (1)$ time, we can process the array in $O (N \log N)$ time such that any query in the form $b_{l} \oplus b_{l + 1} \oplus \dots \oplus b_{r}$ can be answered in $O (1)$ time.

Let $M = ⌊ \frac{1 + N}{2} ⌋ .$ First we can deal with all query intervals that contain both $M$ and $M + 1.$ Suppose that the subsequence contains indices $j_{1} < j_{2} < \dots < j_{a} \leq M < j_{a + 1} < \dots < j_{x} .$ Then we can iterate over all $K$ possible values of $A_{j_{a}}$ and generate the number of possible subsequences for all intervals in the form $[i, M]$ or $[M + 1, i]$ independently in $O (N K)$ time for a total of $O (N K^{2})$ time. The answer for a query $[L, R]$ can then be derived from the answers for $[L, M]$ and $[M + 1, R]$ in $O (K)$ time.

Then we can recursively solve for all queries completely contained within the intervals $[1, M]$ and $[M + 1, N]$ in a similar fashion. If there are no queries left to process for our current interval, we can break immediately. This approach can be improved to run in $O (N \log N \cdot K \log K + Q K)$ time online (though $\log K$ with a high constant is not better than $K$ ).

Dhruv Rohatgi's code ( $O (N K^{2} \log N + Q (K + \log N))$ offline):

#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
#define MAXN 200000
#define MAXQ 200000
#define MOD 1000000007
 
int msum(int a)
{
	if(a >= MOD) return a-MOD;
	return a;
}
 
 
int N,K,Q;
int A[MAXN];
int l[MAXQ], r[MAXQ];
int qid[MAXQ];
int qans[MAXQ];
 
int lans[MAXN][21];
int rans[MAXN][21];
int cnt[21];
 
void countLeft(int a,int b)
{
	for(int i=a;i<=b;i++)
		for(int k=1;k<=K;k++)
			lans[i][k] = 0;
	for(int k=K;k>=1;k--)
	{
		for(int j=k;j<=K;j++)
			cnt[j] = 0;
		for(int i=b;i>=a;i--)
		{
			if(A[i] == k)
			{
				cnt[k] = msum(2*cnt[k] + 1);
				for(int j=k+1;j<=K;j++)
					cnt[j] = msum(msum(2*cnt[j]) + lans[i][j]);
			}
			for(int j=k;j<=K;j++)
				lans[i][j] = msum(lans[i][j] + cnt[j]);
		}
	}
}
 
void countRight(int a,int b)
{
	for(int i=a;i<=b;i++)
		for(int k=1;k<=K;k++)
			rans[i][k] = 0;
	for(int k=1;k<=K;k++)
	{
		for(int j=1;j<=k;j++)
			cnt[j] = 0;
		for(int i=a;i<=b;i++)
		{
			if(A[i] == k)
			{
				cnt[k] = msum(2*cnt[k] + 1);
				for(int j=1;j<k;j++)
					cnt[j] = msum(msum(2*cnt[j]) + rans[i][j]);
			}
			for(int j=1;j<=k;j++)
				rans[i][j] = msum(rans[i][j] + cnt[j]);
		}
	}
}
 
int split(int qa,int qb, int m)
{
	int i = qa;
	int j = qb;
	while(i<j)
	{
		if(r[qid[i]] > m && r[qid[j]] <= m)
		{
			swap(qid[i],qid[j]);
			i++, j--;
		}
		else if(r[qid[i]] > m)
			j--;
		else if(r[qid[j]] <= m)
			i++;
		else
			i++, j--;
	}
	if(i > j) return j;
	else if(r[qid[i]] <= m) return i;
	else return i-1;
}
 
void solve(int a,int b,int qa,int qb)
{
	if(a>b || qa>qb) return;
	if(a == b)
	{
		for(int i=qa;i<=qb;i++)
			qans[qid[i]] = 1;
		return;
	}
	int m = (a+b)/2;
	countLeft(a,m);
	countRight(m+1,b);
	for(int i=m+1;i<=b;i++)
		for(int k=K-1;k>=1;k--)
			rans[i][k] = msum(rans[i][k] + rans[i][k+1]);
	int qDone = 0;
	for(int i=qa;i<=qb;i++)
	{
		int q = qid[i];
		if(r[q] > m && l[q] <= m)
		{
			qans[q] = 0;
			for(int k=1;k<=K;k++)
				qans[q] = msum(qans[q] + (lans[l[q]][k]*((long long)rans[r[q]][k]))%MOD);
			for(int k=1;k<=K;k++)
				qans[q] = msum(qans[q] + lans[l[q]][k]);
			qans[q] = msum(qans[q] + rans[r[q]][1]);
			qDone++;
		}
		else if(qDone>0)
			qid[i-qDone] = qid[i];
	}
	qb -= qDone;
	int qm = split(qa,qb,m);
	solve(a,m,qa,qm);
	solve(m+1,b,qm+1,qb);
}
 
int main()
{
	freopen("nondec.in","r",stdin);
	freopen("nondec.out","w",stdout);
	cin >> N >> K;
	for(int i=0;i<N;i++)
		cin >> A[i];
	cin >> Q;
	for(int i=0;i<Q;i++)
	{
		cin >> l[i] >> r[i];
		l[i]--,r[i]--;
		qid[i] = i;
	}
	solve(0,N-1,0,Q-1);
	for(int i=0;i<Q;i++)
		cout << qans[i]+1 << '\n';
}

Matrix Inverse (full points):

Let $i p r e f [x] = M_{A_{x - 1}}^{- 1} \cdot M_{A_{x - 2}}^{- 1} \dots M_{A_{1}}^{- 1}$ and $p r e f [x] = M_{A_{1}} \cdot M_{A_{2}} \dots M_{A_{x - 1}} .$ It's actually quite easy to compute $M_{x}^{- 1}$ given $M_{x},$ as both of them will be identity matrices with the exception of column $x .$ For example, when $K = 5,$

M - 1 3 = ⎡⎣⎢⎢⎢⎢⎢⎢ 100000100000100 - 1 / 2 - 1 / 2 - 1 / 2 1 / 2 000001 ⎤⎦⎥⎥⎥⎥⎥⎥,

which satisfies

[c 0 c 1 c 2 c 0 + c 1 + c 2 + 2 c 3 c 4] \cdot M - 1 3 = [c 0 c 1 c 2 c 3 c 4] .

We can represent the query $[L, R]$ as the product of the matrices corresponding to $A_{L}, A_{L + 1}, \dots, A_{R} .$ Then we can rewrite the desired product as $i p r e f [L - 1] \cdot p r e f [R] .$

Both $i p r e f$ and $p r e f$ can be computed naively for every $i$ in $O (N K^{3})$ time because multiplying two $K \times K$ matrices takes $O (K^{3})$ time. However, $O (N K^{2})$ can be accomplished due to the special structure of the matrices; after all, they each differ from the identity matrix by only one column.

The answer for each query is equal to $\sum_{i = 0}^{K - 1} (i p r e f [L - 1] \cdot p r e f [R]) [0] [i],$ which can be computed in $O (K^{2})$ time. In fact, this can be sped up to $O (K)$ time because we can rewrite this sum as

\sum i = 0 K - 1 i p r e f [L - 1] [0] [i] \cdot (\sum j = 0 K - 1 p r e f [R] [i] [j]) .

So we can store $i p r e f [L] [0] [i]$ for each $L, i$ in an 2D array which we'll call "isto" and $\sum_{j = 0}^{K - 1} p r e f [R] [i] [j]$ for each $R, i$ in another 2D array which we'll call "sto" in the code below. This is clearly superior to storing $N$ matrices of size $K \times K$ . Overall, this approach runs in $O (N K^{2} + Q K)$ time (and $N K^{2}$ can be improved to $N K \log K$ ).

My code follows.

#include <bits/stdc++.h>
using namespace std;
 
typedef long long ll;
const int MOD = 1e9+7; // 998244353; // = (119<<23)+1
const int MX = 5e4+5; 

void setIO(string name) {
	ios_base::sync_with_stdio(0); cin.tie(0);
	freopen((name+".in").c_str(),"r",stdin);
	freopen((name+".out").c_str(),"w",stdout);
}
 
struct mi {
	int v; explicit operator int() const { return v; }
	mi(ll _v) : v(_v%MOD) { v += (v<0)*MOD; }
	mi() : mi(0) {}
};
mi operator+(mi a, mi b) { return mi(a.v+b.v); }
mi operator-(mi a, mi b) { return mi(a.v-b.v); }
mi operator*(mi a, mi b) { return mi((ll)a.v*b.v); }
typedef array<array<mi,20>,20> T;
 
int N,K,Q;
vector<int> A;
array<mi,20> sto[MX], isto[MX];
mi i2 = (MOD+1)/2;
 
void prin(T& t) { // print a matrix for debug purposes
	for (int i = 0; i < K; ++i) {
		for (int j = 0; j < K; ++j) 
			cout << t[i][j].v << ' ';
		cout << "\n";
	}
	cout << "-------\n";
}
 
int main() {
	setIO("nondec");
	cin >> N >> K; A.resize(N); 
	for (int i = 0; i < N; ++i) cin >> A[i];
	T STO, ISTO;
	for (int i = 0; i < K; ++i) 
		STO[i][i] = ISTO[i][i] = 1;
	for (int i = 0; i <= N; ++i) {
		for (int j = 0; j < K; ++j) 
			for (int k = j; k < K; ++k) 
				sto[i][j] = sto[i][j]+STO[j][k];
		for (int k = 0; k < K; ++k) 
			isto[i][k] = ISTO[0][k];
		if (i == N) break;
		int x = A[i]-1;
		// STO goes from pre[i] to pre[i+1]
		// set STO = STO*M_{A[i]}
		for (int j = 0; j <= x; ++j) 
			for (int k = x; k >= j; --k) 
				STO[j][x] = STO[j][x]+STO[j][k];
		// ISTO goes from ipre[i] to ipre[i+1]
		// set ISTO=M_{A[i]}^{-1}*ISTO
		for (int j = 0; j < x; ++j) 
			for (int k = x; k < K; ++k)
				ISTO[j][k] = ISTO[j][k]-i2*ISTO[x][k];
		for (int k = x; k < K; ++k) 
			ISTO[x][k] = ISTO[x][k]*i2;
	}
	cin >> Q;
	for (int i = 0; i < Q; ++i) {
		int L,R; cin >> L >> R;
		mi ans = 0; 
		for (int j = 0; j < K; ++j) 
			ans = ans+isto[L-1][j]*sto[R][j];
		cout << ans.v << "\n";
	}
}

Here is a problem which uses a similar concept in two dimensions (albeit with smaller matrices).
[/hide]