Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Mike Chambers 636 posts 1253 karma points c-trib
    Sep 23, 2010 @ 00:34
    Mike Chambers
    0

    umbracoUseDirectoryUrls allows commas in the URL...

    for instance I have a URL

     http:://www.domain.com/jp/briefings/briefings/obligation,-opportunity-or-self-preservation

    googling if commas are allowed in the spec isn't agreed apon, some say reserved some say allowed. My vote would be not to use them.

     

    2.2. Reserved Characters

    Many URI include components consisting of or delimited by, certain special characters. These characters are called "reserved", since their usage within the URI component is limited to their reserved purpose. If the data for a URI component would conflict with the reserved purpose, then the conflicting data must be escaped before forming the URI.

     reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | 
                    "$" | "," 
    

    The "reserved" syntax class above refers to those characters that are allowed within a URI, but which may not be allowed within a particular component of the generic URI syntax

  • Dekker 13 posts 33 karma points
    Sep 23, 2010 @ 02:54
    Dekker
    0

    Short answer, you can use commas, but not as part of the "address", only as part of a delimiter (for parameters, for example).

    Long answer - read on!

    When searching, go straight to the source...

    From the RFC 1738 (from 1994) that defines exactly what is a URL, (the link goes right to the page), commas were allowed...

    No corresponding graphic US-ASCII:
    
    URLs are written only with the graphic printable characters of the
    US-ASCII coded character set. The octets 80-FF hexadecimal are not
    used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
    control characters; these must be encoded.
    
    Unsafe:
    
    Characters can be unsafe for a number of reasons.  The space
    character is unsafe because significant spaces may disappear and
    insignificant spaces may be introduced when URLs are transcribed or
    typeset or subjected to the treatment of word-processing programs.
    The characters "<" and ">" are unsafe because they are used as the
    delimiters around URLs in free text; the quote mark (""") is used to
    delimit URLs in some systems.  The character "#" is unsafe and should
    always be encoded because it is used in World Wide Web and in other
    systems to delimit a URL from a fragment/anchor identifier that might
    follow it.  The character "%" is unsafe because it is used for
    encodings of other characters.  Other characters are unsafe because
    gateways and other transport agents are known to sometimes modify
    such characters. These characters are "{", "}", "|", "\", "^", "~",
    "[", "]", and "`".
    
    All unsafe characters must always be encoded within a URL. For
    example, the character "#" must be encoded within URLs even in
    systems that do not normally deal with fragment or anchor
    identifiers, so that if the URL is copied into another system that
    does use them, it will not be necessary to change the URL encoding.
    alpha          = lowalpha | hialpha
    digit          = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                     "8" | "9"
    safe           = "$" | "-" | "_" | "." | "+"
    extra          = "!" | "*" | "'" | "(" | ")" | ","
    national       = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]" | "`"
    punctuation    = "<" | ">" | "#" | "%" | <">
    
    unreserved     = alpha | digit | safe | extra
    uchar          = unreserved | escape
    xchar          = unreserved | reserved | escape
    digits         = 1*digit
    

    Therefore, commas were marked as "safe extra-national punctuation". 

    BUT

    An update to the standard RFC 3986 (Jan 2005) states:

    URI producing applications should percent-encode data octets that
    correspond to characters in the reserved set unless these characters
    are specifically allowed by the URI scheme to represent data in that
    component.  If a reserved character is found in a URI component and
    no delimiting role is known for that character, then it must be
    interpreted as representing the data octet corresponding to that
    character's encoding in US-ASCII.
    
    Therefore commas are allowed as *delimiters* within URLs (separating parameters after a # for instance), but must not be allowed as part of the URL body itself. Interesting reading, really, and I would like to point out that the standard Umbraco conversion of page name into URL FAILS this test miserably :)
    And that is why you can (and can not) use commas in your URLs.

     

    Dekker

  • Stefan Kip 1614 posts 4131 karma points c-trib
    Sep 23, 2010 @ 09:20
    Stefan Kip
    0

    @Dekker
    Don't ask for thumbs-up, if your post was good, then it will get thumbs-up :-)

    Ontopic: You can add the comma to the urlReplacing config section in the config\umbracoSettings.config file...

  • Rik Helsen 670 posts 873 karma points
    Sep 23, 2010 @ 09:23
    Rik Helsen
    0

    In the end you'll need to add quite a few characters to the umbracosettings.config section, if you don't want to have naste html encoded links (all french accents like é à ç è , ...)

    Kind regards,

    Rik

  • Stefan Kip 1614 posts 4131 karma points c-trib
    Sep 23, 2010 @ 09:25
    Stefan Kip
    0

    Yes, that's too bad isn't it?
    Umbraco should give you the option to define your own 'GenerateNiceUrl' method, something like this:

    public static string GenerateSlug(string phrase) 
    {
    string str = phrase.ToLower();
    str = Regex.Replace(str, @"[^a-z0-9\s-]", ""); // invalid chars
    str = Regex.Replace(str, @"\s+", " ").Trim(); // convert multiple spaces into one space
    str = str.Substring(0, str.Length <= 45 ? str.Length : 45).Trim(); // cut and trim it
    str = Regex.Replace(str, @"\s", "-"); // hyphens
    return str;
    }
Please Sign in or register to post replies

Write your reply to:

Draft