
However, it's perfectly reasonable for a non-malicious URL to contain double-encoded values, specially in search-string. For example, attacks are often perpetrated using double-encoding.

When you canonicalize a URL in an effort to protect the user, you are going to break some things. To be clear, canonicalization of a URL is a trade-off. As such, I wanted to see if I could canonicalize a URL by breaking it up into its various URI components calling canonicalize() on the individual parts and then join those components back into a single, sanitized URL in Lucee CFML 5.3.6.61. There are other reasons that calling canonicalize() on an entire URL is a bad idea (such as the fact that URLs can contain non-malicious encoded values). This feature of the function will inadvertently corrupt a URL that is being sanitized. The other day, I wrote about how the canonicalize() function decodes strings that look like HTML entities in Lucee 5. As such, please see this post as an exploration and not necessarily as any kind of best practice.

I do work with security experts and I try my best to implement their vision and understanding of the world and malicious actors but, I am just a flawed human.
