Challenge #14 – Testing Santa’s Patience

11 December 2023 Solved Php Difficult

Santa is gear­ing up for the hol­i­day sea­son by mod­ern­iz­ing the gift list sys­tem at his North Pole headquarters.

Based on Rudolph’s expert recom­mend­a­tion, they are using Craft CMS to organ­ize links to the vari­ous online sup­pli­ers and gifts.

Unfor­tu­nately, the con­tent author­ing aspect of the oper­a­tion was del­eg­ated to the elves, who are not only tech­nic­ally chal­lenged, but also dipped a bit too deeply into the eggnog.

The elves didn’t know any­thing about Craft CMS’s cat­egor­ies or sites or any­thing sens­ible, but instead cre­ated a Gifts sec­tion with two plain text fields: Supplier URL and Gift URI

They put the URL to the sup­pli­er in the Supplier URL field, and the URI path in the Gift URI field, so we have to com­bine the two togeth­er to make a com­plete URL to the gift that Santa can use.

Why did they do this? One can­not won­der the tangled ball of yarn that is the elve’s brains, espe­cially fully sat­ur­ated with alcohol.

But it gets worse.

Whilst copy-past­ing the Supplier URLs and Gift URIs into Craft CMS, they didn’t always do a great job of it. Often they copied too much of each, so they over­lap, and can’t be simply combined.

For example, look at this monstrosity:

Sup­pli­er URL: https://SantasHelper.com/es/products/gifts/
Gift URI: /es/products/gifts/three-foot-candy-cane

The final URL should look like this: 

https://SantasHelper.com/es/products/gifts/three-foot-candy-cane

…but if we just com­bined the Supplier URL and Gift URI togeth­er, it would look like this: 

https://SantasHelper.com/es/products/gifts//es/products/gifts/three-foot-candy-cane

What you need to do is write an algorithm that com­bines a Supplier URL togeth­er with a Gift URI in such a way that any over­lap­ping suf­fix and pre­fix are merged together.

We’ll be doing this in PHP, but nev­er fear if PHP isn’t your strong suit! We’ve provided a cus­tom mod­ule in which you just have to imple­ment a mergeUrlWithPath() function.

This Git­Hub repo con­tains a Craft CMS site that you can spin up with a single com­mand, either loc­ally or in the browser using a Git­Hub Codespace (see the readme file in the repo). The modules/Module.php file con­tains a mergeUrlWithPath() meth­od that you can modify.

Because Santa is a per­fec­tion­ist, he’s writ­ten a num­ber of tests that your algorithm will have to pass before it’s con­sidered accept­able. Santa wrote these tests using the fant­ast­ic Pest test­ing frame­work (with the aid of Mark Huot’s Craft Pest plugin).

To use it, with the pro­ject run­ning, make a new ter­min­al win­dow by click­ing on the + icon in the Git­Hub CodeSpaces ter­min­al (or if you’re doing it loc­ally, open up a new ter­min­al how­ever you like) and type:

make test

You’ll ini­tially see some­thing like this:

Failing tests

As you write your algorithm, your goal is to make it pass all of the tests… and con­grat­u­la­tions! You’ve just taken your first steps into the world of Test Driv­en Devel­op­ment (TDD)!

Rules

Your solu­tion should con­sist of a work­ing mergeUrlWithPath() meth­od that makes all tests pass.

Tips

Loop over strings, use recur­sion or oth­er­wise, while watch­ing out for pesky slashes.

Acknow­ledge­ments

Writ­ten by Andrew Welch & Ben Croker.

Solution

There are two gen­er­al approaches we can take to solve this.

String Com­par­is­on
Loop over the char­ac­ters in the path and com­pare them to the char­ac­ters in the URL. Find the largest over­lap­ping URL suf­fix and path pre­fix and remove the duplic­a­tion before merging.

Seg­ment Comparison
Break the URL and path into seg­ments. Find the com­mon seg­ments at the end of the URL and start of the path and remove them before merging.


String Com­par­is­on

This approach involves com­par­ing the URL suf­fix (the end of the string) with the path pre­fix (the start of the string) of vary­ing lengths. What we’re look­ing for is the largest over­lap­ping sub­string, which we will then remove from the path before mer­ging it with the URL.

Since the URL and path may or not end and start with a slash, it is safest to first nor­mal­ize them. We can do this by trim­ming a slash (from the right and the left, respect­ively, to avoid end­ing up with a double slashes) and then adding it back in.

$url = rtrim($url, '/') . '/';
$path = '/' . ltrim($path, '/');

Next comes our test for an over­lap. We’ll wrap this in a for loop, start­ing with an empty over­lap, and add a char­ac­ter from the path at each iter­a­tion. Any time the URL ends with the $test string, we’ll update the value of $overlap.

$overlap = 0;
for ($i = 0; $i < strlen($path); $i++) {
    $test = substr($path, 0, $i);
    if (str_ends_with($url, $test)) {
        $overlap = $i;
    }
}

Finally, we merge the URL with the part of the path after the overlap.

return $url . substr($path, $overlap);

Notice how our test string grows with each iter­a­tion. This res­ults in n iter­a­tions where n is the length of the path. 

An altern­at­ive, slightly more optim­al solu­tion would be to reverse the dir­ec­tion of the loop, and hence the test string. Start­ing from the full path, we’ll take a char­ac­ter off the end in each iter­a­tion. This allows us to break out of the for loop as soon as we find a match, mean­ing that n is the length of the over­lap (and at most the length of the path).

$overlap = 0;
for ($i = strlen($path); $i > 0; $i--) {
    $test = substr($path, 0, $i);
    if (str_ends_with($url, $test)) {
        $overlap = $i;
        break;
    }
}

The full solu­tion is then as follows.

public function mergeUrlWithPath(string $url, string $path): string
{
    $url = rtrim($url, '/') . '/';
    $path = '/' . ltrim($path, '/');
    
    $overlap = 0;
    for ($i = strlen($path); $i > 0; $i--) {
        $test = substr($path, 0, $i);
        if (str_ends_with($url, $test)) {
            $overlap = $i;
            break;
        }
    }
    
    return $url . substr($path, $overlap);
}

This passes all of of the provided tests.

Passing tests

Ben Croker came up with a sim­il­ar imple­ment­a­tion using recursion.

Andrew Welch used a while loop, com­par­ing the strings in a slightly dif­fer­ent way. 


Seg­ment Comparison

This approach involves first break­ing the URL and path into seg­ments using a slash delim­iter, find­ing the over­lap­ping seg­ments and then mer­ging the unique seg­ments back togeth­er. There are vari­ous ways of doing this and we’re going to refer to the sub­mit­ted solu­tions rather than walk through any par­tic­u­lar implementation.

Chris Sar­geant delivered a solu­tion that, while some­what intric­ate, passes all tests.

Robin Gau­th­i­er and Lukas Jansen both lever­aged PHP’s parse_url func­tion to help parse the URL and path before per­form­ing their comparisons.

Kris­ti­an S. P. sub­mit­ted a metic­u­lously com­men­ted solu­tion and even sub­mit­ted a second, reads-more-like-a-Ger­man-sen­tence solu­tion. Jawohl!

Chris Viol­ette made use of Craft’s UrlHelper and FileHelper to help con­struct his solution.

Michael Thomas took advant­age of col­lec­tion meth­ods to push and san­it­ise URI segments.


Edge-case Issues

All of the sub­mit­ted solu­tions passed all tests, mak­ing every­one a winner! 

The story, how­ever, takes a turn for the worse.

There were some edge-case sup­pli­er URL and gift URI com­bin­a­tions in the mix that were incor­rectly merged. This meant that some sup­pli­ers nev­er received the gifts they were sup­posed to. Santa’s tests, it turns out, were fallible.

Only two of the sub­mit­ted solu­tions [1, 2] man­aged to pass the fol­low­ing unfore­seen edge-cases”. All oth­er solu­tions failed in one or more of them.

Supplier URL: 'https://SantasHelper.com/es'
Gift URI: 'essentials/gifts/3ft-cane'
Incorrect Result: 'https://SantasHelper.com/es/sentials/gifts/3ft-cane'

Supplier URL: 'https://SantasHelper.com/es/'  
Gift URI: '/essentials/es/gifts/3ft-cane'
Incorrect Result: 'https://SantasHelper.com/es/essentials/gifts/3ft-cane'

Supplier URL: 'https://SantasHelper.com/es/gifts/'  
Gift URI: '/gifts/3ft-cane'
Incorrect Result: 'https://SantasHelper.com/es/gifts/gifts/3ft-cane'

Supplier URL: 'https://SantasHelper.com/essentials/es/'  
Gift URI: '/essentials/es/gifts/3ft-cane'
Incorrect Result: 'https://SantasHelper.com/essentials/es/essentials/es/gifts/3ft-cane'

Con­clu­sion

Edge-cases are hard to catch ahead of time. This is one of the down­falls of test-driv­en-devel­op­ment. You may think you fully under­stand a prob­lem before imple­ment­ing it, but this does not rule out the value of writ­ing tests after you’ve man­aged to get your ini­tial tests to all pass.

It turns out that even Santa is only human”. Hope­fully he learns his les­son and demands prop­erly format­ted con­tent from his sup­pli­ers next year!

Submitted Solutions

  • Chris Sargeant
  • Robin Gauthier
  • Andrew Welch
  • Kristian S. P.
  • Chris Violette
  • Lukas Jansen
  • Ben Croker
  • Michael Thomas