GCJ – Garbled Email

Garbled Email, Round 1B 2013.

This is a tricky problem.  The first tricky part is figuring out an efficient way to store the dictionary so that we can quickly identify whether a particular substring is a prefix of a word in the dictionary.  This is a well-studied problem, and the right data structure is a trie.  Suppose we’re working with a small dictionary containing the words a, aac, acb, baa, bbc, bc, caba, caab.  We’ll start our trie with an empty node, and then keep adding words until we’re finished.  We start by adding the word a.  We link a node with the letter a to the root, and then link a marker node to that to show that this is the end of a word.

A simple trie.

Next, we’ll add in the word aac. We already have the letter a linked to the root, so we go there and add more letters to that node.

A larger trie.

And we repeat this until all the words are in the tree. For each letter in the new word, if that letter is already a child of the current node, we walk to that child. If not, we create a new child. Once we’re at the end of the word, we add a marker child to the last node denoting the end of the word. After we’re finished with our small dictionary, we end up with this:

A full trie.

Each of the red nodes marks the end of a word, so we can see that we have all eight words in the trie.

Now that we have our data stored in an easily-accessible manner, let’s start with a simpler problem than the one asked.  Suppose that there was no garbling, just spaces removed, and we’re given some string, say, aacaabbc.  How can we use this structure to quickly find out whether this string could be divided up into words from our dictionary?  Start at the root of the tree and look at the first letter a of the string.  a is a child of our current node, so we’ll walk to that node and move to the next letter.  Here we see an end node as a child, marking that we’ve now finished a possible node.  So we know we can split off the word a from our string, leaving acaabbc behind.  This split will produce a valid word separation for the whole string if and only if acaabbc can itself be divided up into words from our dictionary.

First, though, we need to finish walking down and make sure we’ve found the only possible split.  The next letter is a, which is a child of our current node, so we move there.  This node has no end node as a child, so it’s not a possible word separation.  The third letter is c, which is a child of this node, so we’ll move there next.  This node is another word separation, giving us aac as the first word and aabbc as the remainder of the string.  There are no other children of this node, so these are the only possible word separations.

So we now know that aacaabbc is divisible into words if and only if at least one of its two suffixes acaabbc and aabbc are divisible into words.  This suggests that we might be able to use dynamic programming for our problem.  We can create an array of boolean variables telling us whether each suffix of our string can be divided into words.  We can look at the suffixes of our string in order of increasing size and calculate whether each of them is divisible.  The larger suffixes can then use the calculations previously done on the smaller suffixes to quickly calculate whether they themselves are divisible.

Dynamic programming results.

Starting with the smallest suffix, c, we quickly find that it is not a word, so we mark that index as False.  bc, the next smallest, so we mark it as True, and similarly for bbc.  abbc gives a potential split at a|bbc, and we know that bbc is True, so abbc must be True as well.  We then keep working backwards until we reach the beginning of the string, and then the final answer is simply the result in the first element of our array.  Doing this we quickly find that aacaabba is in fact divisible.

So this works for messages where there is no garbling at all.  Now let’s add the garbling in and see if we can work out a solution.  We’ll simplify the problem slightly, though, by allowing any character in the string to be garbled, ignoring the constraints on how close the garbled characters are allowed to be.  Suppose we have the string abbbcbba.  We can work backwards, just as before, from smaller to larger suffixes.  This time, though, instead of only storing whether or not a suffix is divisible into words, we’ll store how many changes are needed to make it divisible.

Dynamic programming with counts.c alone, for instance, isn’t a word, but we can change it into a with a single substitution, so our answer is one.  The next two suffixes, bc and bbc, are both words, so they get values of zero.  cbbc isn’t valid, but if we swap it to abbc we can split it at a|bbc, giving us two valid words, for an answer of one.  We keep working backwards just as before, looking for all possible words we can construct at the start of our suffix, splitting it, and then putting the minimum changes required into our array.  Once we’re finished, the final answer is just the result in the first element of our array.  So we find that abbbcbbc can be made divisible with only a single character swap.

Finally, we want to incorporate the closeness requirements for our garbled characters – changes must have at least four non-garbled characters between them.  Now for each suffix we want to store five different values, giving the minimum changes required for this suffix when the first x characters are required to match perfectly, with x ranging between 0 and 4 inclusive.  Let’s look at this array for the string abbbcbba.

Final dynamic programming array.

Here N is marking divisions that are not possible at all.  Otherwise, we fill the array with the number of changes required.  Again, let’s start at the back and move towards the front of the string.  a, the last suffix, is a word itself, so we fill its column with zeros.  ba is not a word, but could be divisible if we change one of the characters, to either bc or a|a.  The bottom three rows require us to keep the first two characters unchanged, so those entries are marked as N.  Both of the top two are possible with only a single change.  bba is valid only if changed into bbc, which requires no more than two characters held unchanged, so we mark up the column appropriately.  We continue in the same manner, looking for potential word divisions and marking down the minimum changes required that allow the required distance between changes.  Then, just as with the simpler problems, we read off our final answer as the entry in the first column and the first row.   Continue reading

Advertisements

GCJ – Consonants

Consonants – Round 1C 2013

The first thing to notice about this problem is that in order to identify substrings containing at least n consonants we will need to identify any stretches of exactly n consonants in the word.  Obviously, if there are no such stretches in the word, then there are no possible substrings containing consecutive consonants.

Suppose instead we find exactly one such substring, as with the sample word quartz with an n-value of 3.

A simple sample word.

A simple sample word.

The only valid stretch here is from index 3 to 5. How many total substrings can be made that contain this stretch of consonants? Well, we can vary the start of the substring from 0 to 3 and the end of the substring from 5 to 5, and both the start and the end of the substring can be varied independently, which gives us 4 \cdot 1 = 4 total possible substrings containing this stretch.

We can then generalize this to an arbitrary word of length L, containing only a single stretch of consecutive consonants.  The stretch begins at index i and ends at index j, where j - i = n - 1 . Then there are (i + 1) possible positions for the start of substrings containing this stretch (all indices from 0 to i inclusive) and L - j possible positions for the end of substrings containing this stretch (all indices from j to L – 1 inclusive). Since these choices of start and end positions are independent, that gives us (i + 1)(L - j) possible substrings.

Now we need to expand our analysis to words containing multiple stretches of consonants.

A slightly more complex sample word.

A slightly more complex sample word.

In the word straight we have two disjoint consonant stretches of length n=3, [0, 2] and [5, 7]. Moving from left to right, we can start by applying our previous analysis to the first substring, giving us 6 possible substrings containing the stretch [0, 2]. Applying the same calculation to the second stretch, we calculate 6 substrings containing it. So we can add these and get a total of 12, right? Well, no, as you can see from the problem statement the correct answer is 11, because we double-counted the substring [0, 7], which contains both stretches of consonants.

So for stretches after the first, we want to calculate not the number of substrings containing it, but the number of substrings that both contain it and do not contain any previously analyzed substrings. Since we’re analyzing substrings from left to right, we can do this by running the left index of the substring not from 0 to i but instead from k + 1 to i, where k is the start index of the previous substring. That gives us (i - k)(L - j) new substrings containing this stretch of consonants. If we do this with i, j, k = 5, 7, 0 for the second substring, we get 5, which gives us the correct answer when added to the 6 substrings containing the first substring.

A sample word with overlapping stretches of consonants.

A sample word with overlapping stretches of consonants.

Now we want to check and make sure that we get the correct answer for words with overlapping stretches of n consonants. The word gcj contains two such stretches, [0, 1] and [1, 2]. With our methods, we obtain 2 substrings containing the first and 1 containing the second but not the first, which is the correct answer.

Finally, we need to quickly identify all such stretches in the word from left to right. We can do this by iterating through each letter in a word. If this letter is a consonant, we increment the size of the current stretch of consonants, while capping the size at n. If it is not a consonant, we set the size to 0. Then, if the size is n, we are located at the end of a stretch of n consonants, and we should increment our count of substrings by (i - k)(L - j) .

However, we are only keeping track of the current position, which is at the end of the active stretch of consonants, or k, so we need to calculate i in terms of k, which is simply i = j - n + 1 . We then substitute this into our last expression, giving the number of additional substrings contributed by each stretch of consonants as (j - n - k + 1)(L - j) .

Continue reading

GCJ – Osmos

Osmos, Round 1B 2013

This was a fun little problem in recursion. We have a mote that wants to eat other smaller motes, and we want to enable our mote to eat all of the others by either removing uneatable motes or adding new, eatable ones. The first thing to notice is that given a set of motes to try to eat, it is always optimal to try to eat the smallest mote. If we can’t eat it, then we know that we also can’t eat any of the larger motes, and if we can eat it, then we’ll only be larger afterwards, able to eat at least as many of the remaining motes as we could before. Secondly, if we want to remove uneatable motes, it’s always optimal to remove the largest motes we can.  If there are any uneatable motes, then the largest must be one of them, and removing it won’t impede our ability to eat any larger motes, if we decide to add more motes later. Thirdly, if we do wish to add a mote to enable our mote to eat others, it is optimal to always add the largest mote that we can currently eat, which will be 1 unit smaller than our current size.

Together, these suggest that the first thing we want to do is sort the motes from smallest to largest and analyze them in order.  For each mote, we have two possibilities:  first, it may be small enough to eat, in which case we eat it, our mote gets larger, and we move on to the next smallest mote.  Second, it may be too big to eat, in which case we can either add another mote just smaller than our current size in hopes that eating it will allow our mote to grow large enough to continue, or we can simply remove all the remaining motes and be finished.  Which of these we choose depends on which will cost fewer changes.  Removing all remaining motes will cost L, where L is the number of motes remaining.  Adding an additional mote will cost 1 plus the number of changes it takes to eat the remaining motes with our new larger mote.

In general, the cost of eating a list of motes M with a mote of size A is
C(A, M) = \begin{cases} min(L(M), C(2A - 1, M)) & A \le M_0 \\ C(A + M_0, M_{1...L}) & A > M_0 \end{cases}
We can calculate this fairly simply with a recursive formula. Our base cases will be A=1 , where A can never grow and we must remove all motes and L(M)=0 , where there are no motes remaining, and we simply return 0. For other cases, if we can eat the first mote, then we recursively call with A incremented by M_0 and M_0 removed from M . If we cannot, then we return the minimum of L(M) , and the recursive call with A incremented by A - 1 .

Of note is that this can lead to a recursive stack as large as L(M) , but as this is limited to 100 in the problem statement, we shouldn’t have to worry about space limitations with this solution.

Continue reading