After sequencing the genomes of his cows, Farmer John has moved onto genomic editing! As we know, a genome can be represented by a string consisting of As, Cs, Gs, and Ts. The maximum length of a genome under consideration by Farmer John is 105105.
Farmer John starts with a single genome and edits it by performing the following steps:
For example, if FJ started with the genome AGGCTTT then he would perform the following steps:
Unfortunately, after editing the genome, Farmer John's computer crashed and he lost the sequence of the genome he originally started with. Furthermore, some parts of the edited genome have been damaged, meaning that they have been replaced by question marks.
Given the sequence of the edited genome, help FJ determine the number of possibilities for the original genome, modulo 109+7109+7.
A non-empty string where each character is one of A, G, C, T, or ?.
The number of possible original genomes modulo 109+7109+7.
?
4
The question mark can be any of A, G, C, or T.
GAT?GTT
3
There are two possible original genomes aside from AGGCTTT, which was described above.
AGGATTT -> AG | GAT | T | T -> GA | TAG | T | T -> GATAGTT TAGGTTT -> TAG | GT | T | T -> GAT | TG | T | T -> GATTGTT
Problem credits: Benjamin Qi
[/hide]
(Analysis by Danny Mittal)
For convenience, let SS be the given string.
Subtask 1 (|S|≤10|S|≤10)
There are at most 410≈106410≈106 possible genomes, so we can just try each one to see if it produces something consistent with the input after Farmer John's edits. This runs in O(|S|4|S|)O(|S|4|S|).
Subtask 2 (|S|≤102|S|≤102)
Consider dividing the result of Farmer's John's edits back into the substrings of the genome after he had reversed them (but before he had concatenated them). Such a division could only correspond to one starting genome, which we could get by reversing the substrings back and concatenating them. However, not every division is valid (corresponds to some starting genome). In order for a division to be valid, it must satisfy two properties:
Given these conditions, we can solve this problem using DP. It is useful to first compute a preliminary DP that for each substring S[j…k]S[j…k] and letters l,ml,m computes the amount of ways to replace the question marks in s[j…k]s[j…k] with letters so that s[j…k]s[j…k] starts with ll, ends with mm, and satisfies our second condition, i. e. doesn't contain any two adjacent equal letters. This DP is fairly straightforward to compute in O(|S|2)O(|S|2).
After that, we will perform our main DP, where for each index kk and letter ll, we compute the number of divisions of S[1…k]S[1…k] such that the last substring in the division begins with ll. To transition, we try each possible ending index jj of the previous substring in our division. By making use of our preliminary DP values for s[j+1…k]s[j+1…k], we can do this in constant time for each (j,k)(j,k), making this DP, and thus the overall algorithm, O(|S|2)O(|S|2) as well.
Subtask 3 (|S|≤105|S|≤105)
To solve this subtask, we will optimize our solution to the previous subtask.
In our new DP, we will instead, for each kk, count divisions of S[1…k]S[1…k] which may be incomplete, in that the last letter of the last substring may not match the first character of the second-to-last substring. This means that in addition to keeping track of the first letter ll of the last substring, we also need to keep track of the last letter mm of the last substring as well as the first letter nn of the second-to-last substring. We will refer to said DP value as dpk,n,l,mdpk,n,l,m.
The optimization comes from the transitions necessary for this DP. Instead of going through all previous indexes jj, our transition is either extending the last substring in our division up to k−1k−1 by one character, or adding a new substring consisting of a single character to our division up to k−1k−1.
In the first type of transition, the letters nn and ll stay the same, and the last two letters in the substring we are extending must not be equal in order to satisfy our second condition, so we simply add dpk−1,n,l,mdpk−1,n,l,m to dpk,n,l,m′dpk,n,l,m′ for all n,l,m,m′n,l,m,m′ such that m≠m′m≠m′ and m′m′ is a valid letter for SkSk.
In the second type of transition, the old values of nn and mm must match in order to satisfy our first condition, so we add dpk−1,m,l,mdpk−1,m,l,m to dpk,l,m′,m′dpk,l,m′,m′ for all m,l,m′m,l,m′ such that m′m′ is a valid letter for SkSk.
To compute our final answer, we simply need to ensure that we have n=mn=m, so our answer is the sum over dp|S|,m,l,mdp|S|,m,l,m for all m,lm,l.
For each kk, we compute at most 44+4344+43 transitions, which is constant (and still small enough), making our algorithm O(|S|)O(|S|).
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; public class BovineGenomicsII { public static final long MOD = 1000000007; public static final char[] BASES = "AGCT".toCharArray(); public static void main(String[] args) throws IOException { BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); char[] s = in.readLine().toCharArray(); long[][][][] dp = new long[s.length][4][4][4]; for (int n = 0; n < 4; n++) { for (int l = 0; l < 4; l++) { if (s[0] == '?' || s[0] == BASES[l]) { dp[0][n][l][l] = 1L; } } } for (int k = 1; k < s.length; k++) for (int m2 = 0; m2 < 4; m2++) if (s[k] == '?' || s[k] == BASES[m2]) { for (int n = 0; n < 4; n++) { for (int l = 0; l < 4; l++) { for (int m = 0; m < 4; m++) { if (m != m2) { dp[k][n][l][m2] += dp[k-1][n][l][m]; dp[k][n][l][m2] %= MOD; } if (n == m) { dp[k][l][m2][m2] += dp[k-1][n][l][m]; dp[k][l][m2][m2] %= MOD; } } } } } long answer = 0; for (int l = 0; l < 4; l++) { for (int m = 0; m < 4; m++) { answer += dp[s.length - 1][m][l][m]; } } answer %= MOD; System.out.println(answer); } }
[/hide]
翰林课程体验,退费流程快速投诉邮箱: yuxi@linstitute.net 沪ICP备2023009024号-1