Saturday, May 18, 2013

Google Codejam 2013 Round 1C

Problem A. Consonants

http://code.google.com/codejam/contest/2437488/dashboard#s=p0

In brief, given a string \(S\) return the total number of substrings containing at least \(N\) consecutive consonants. The vowels are define as a, e, i, o, and u, and our consonants are the remaining 21 letters.

Let's begin by solving the small input of this problem, because the size of the string \(L\) can be at most 100. We just simple iterate over all the possibles substrings of the given string. For each substring check if contains at least one occurrence of \(N\) consecutive consonants, if that's the case we count that substring in our final answer. Things get a little bit more interesting when we are facing the large input, the size of the given string \(L\) can be at most \(10^{6}\). Clearly, the previous algorithm is not going to be fast enough. So we need a different approach to attack this problem.

First let's get some intuition with the following example:

S =  "quxrtz"
L =  6
N = 2

Imagine that we are iterating our string, until we find the substring "xr" which has \(N = 2\) consecutive consonants:

Now we should ask ourselves the following question, how many substrings  contains the string "xr"?

Listing all the substring of the given string S, we have in total 9 of them:

{"q", "u", "x", "r", "t", "z", "qu", "ux", "xr", "rt", "tz", "qux", "uxr",
"xrt", "rtz", "quxr", "uxrt", "xrtz", "quxrt", "uxrtz", "quxrtz"}

Where this answer come from? let's imagine for a second that we are building all the substrings that contains the string "xr". To do that we could either add characters to the front or to the end of "xr".

In the case of the left side we have 3 different possibilities, we can add "u", "qu" or "" (the empty string) to the front of "xr". In the same manner, in the case of the right side we also have 3 different possibilities, we can add "t", "tz" or "" (the empty string) to the end of the string "xr". By the rule of product we have \(3 \cdot  3 = 9\) substring including the string "xr".

Formalizing more the previous statement, let's call \(i\) the index of the last consonant that we found (0-based), and, let's call \(C\) the count of consecutive consonants so far. If the condition \(C \geq N\) holds, our count of substring that include this \(N\) consonants is given by the expression \((L - i) \cdot (i - N + 1)\).

There is just one problem remaining, what happen then if we apply the same method to the next N consecutive consonants:

We end up double counting some of the substrings (in red color the substrings that we have counted twice):

{"q", "u", "x", "r", "t", "z", "qu", "ux", "xr", "rt", "tz", "qux", "uxr",
"xrt", "rtz", "quxr", "uxrt", "xrtz", "quxrt", "uxrtz", "quxrtz"}

The solution to this issue is simple, instead of taking the whole left part of our string, we just take the part that we haven't count yet, we can do that by maintaining a pointer to the last \((i - N + 1)\) seen so far.

Adding this details left, we end up with the following expression \((L - i) \cdot (i - N + 1 - P)\) where \(P\) is the pointer to the last \((i - N + 1)\).

No comments: