# GCJ – Consonants

Consonants – Round 1C 2013

The first thing to notice about this problem is that in order to identify substrings containing at least n consonants we will need to identify any stretches of exactly n consonants in the word.  Obviously, if there are no such stretches in the word, then there are no possible substrings containing consecutive consonants.

Suppose instead we find exactly one such substring, as with the sample word quartz with an n-value of 3.

A simple sample word.

The only valid stretch here is from index 3 to 5. How many total substrings can be made that contain this stretch of consonants? Well, we can vary the start of the substring from 0 to 3 and the end of the substring from 5 to 5, and both the start and the end of the substring can be varied independently, which gives us $4 \cdot 1 = 4$ total possible substrings containing this stretch.

We can then generalize this to an arbitrary word of length L, containing only a single stretch of consecutive consonants.  The stretch begins at index i and ends at index j, where $j - i = n - 1$. Then there are $(i + 1)$ possible positions for the start of substrings containing this stretch (all indices from 0 to i inclusive) and $L - j$ possible positions for the end of substrings containing this stretch (all indices from j to L – 1 inclusive). Since these choices of start and end positions are independent, that gives us $(i + 1)(L - j)$ possible substrings.

Now we need to expand our analysis to words containing multiple stretches of consonants.

A slightly more complex sample word.

In the word straight we have two disjoint consonant stretches of length n=3, [0, 2] and [5, 7]. Moving from left to right, we can start by applying our previous analysis to the first substring, giving us 6 possible substrings containing the stretch [0, 2]. Applying the same calculation to the second stretch, we calculate 6 substrings containing it. So we can add these and get a total of 12, right? Well, no, as you can see from the problem statement the correct answer is 11, because we double-counted the substring [0, 7], which contains both stretches of consonants.

So for stretches after the first, we want to calculate not the number of substrings containing it, but the number of substrings that both contain it and do not contain any previously analyzed substrings. Since we’re analyzing substrings from left to right, we can do this by running the left index of the substring not from 0 to i but instead from k + 1 to i, where k is the start index of the previous substring. That gives us $(i - k)(L - j)$ new substrings containing this stretch of consonants. If we do this with i, j, k = 5, 7, 0 for the second substring, we get 5, which gives us the correct answer when added to the 6 substrings containing the first substring.

A sample word with overlapping stretches of consonants.

Now we want to check and make sure that we get the correct answer for words with overlapping stretches of n consonants. The word gcj contains two such stretches, [0, 1] and [1, 2]. With our methods, we obtain 2 substrings containing the first and 1 containing the second but not the first, which is the correct answer.

Finally, we need to quickly identify all such stretches in the word from left to right. We can do this by iterating through each letter in a word. If this letter is a consonant, we increment the size of the current stretch of consonants, while capping the size at n. If it is not a consonant, we set the size to 0. Then, if the size is n, we are located at the end of a stretch of n consonants, and we should increment our count of substrings by $(i - k)(L - j)$.

However, we are only keeping track of the current position, which is at the end of the active stretch of consonants, or k, so we need to calculate i in terms of k, which is simply $i = j - n + 1$. We then substitute this into our last expression, giving the number of additional substrings contributed by each stretch of consonants as $(j - n - k + 1)(L - j)$.

def solve_case(case):
"""Take the input data (structured in case) and perform any necessary
calculations to obtain the desired output, formatted as the appropriate
string.
"""

name, n = case
L = len(name)

valid = [x in 'bcdfghjklmnpqrstvwxyz' for x in name]

count = 0
size = 0
k = -1

for j in range(letters):
if valid[j]:
size = min(size + 1, n)
else:
size = 0
if size == n:
count += (j - n - k + 1) * (L - j)
k = j - n + 1

return count