Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • stc 72 posts 101 karma points
    Feb 28, 2010 @ 17:02
    stc
    0

    Extract image from bodyText using XSLT

    Hi guys,

      I was wondering if it was possible to get an image that was inserted inside the bodyText of my "article" document type's property...I would like to keep the "article" property as simple as possible, and custom adding an "image" property to my "article" document type is what I am trying to avoid, mostly because I'd like for the article editors to work using WLW (or MS Word) in which I was unable to find a way to assign a picture to such document type properties that are not simple and/or richtext fields...

      Best practice solution would be great...but one that avoids using umbraco's backend by article editors please...thanks in advance.

  • stc 72 posts 101 karma points
    Feb 28, 2010 @ 17:04
    stc
    0

    Unfortunatelly the feature to edit posts seems to be missing...I'd like to get the first article image..if there are multiple images in the article (just in case it's unclear:)

  • Morten Bock 1867 posts 2140 karma points MVP 2x admin c-trib
    Feb 28, 2010 @ 17:15
    Morten Bock
    1

    I think I would go for creating an xslt extension for that purpose, and use RegEx to find the first img element and get the src attribute from that.

  • stc 72 posts 101 karma points
    Feb 28, 2010 @ 19:07
    stc
    0

    Umm huh :( ...figured as much...but was kinda hoping that there was some umbraco.library GetMedia-like thingy :)

    Could I propose that umbraco db gets extended so that you can figure out which media was posted along with which article (content node)...that way this could be easily done...additional (rather useful IMHO) feature although in terms of reuse of media items you'd be pressed to resort back to something else.

    Thanks anyway Morten

  • Morten Bock 1867 posts 2140 karma points MVP 2x admin c-trib
    Feb 28, 2010 @ 19:11
    Morten Bock
    0

    Sorry to disappoint you :-)

    But the only reference saved to the media is the string in the html.

  • Lee Kelleher 4026 posts 15836 karma points MVP 13x admin c-trib
    Mar 01, 2010 @ 06:05
    Lee Kelleher
    0

    Hi stc,

    Here's an idea for a solution to your specific problem.  Code can be developed to hook into the Document.BeforePublish event to examine the "article" (body text) value for any HTML images, extract the first one and assign it to different property.

    http://our.umbraco.org/wiki/reference/api-cheatsheet/using-applicationbase-to-register-events

    Usually, I'd suggest using a regular expression to get the <img> tags from the HTML... but now I'd recommend the Html Agility Pack:

    http://www.codeplex.com/htmlagilitypack

    Here's a quick snippet from StackOverflow on how to extract <img> tags from HTML:

    http://stackoverflow.com/questions/790559/how-to-extract-image-urls-from-html-file-in-c/790566#790566

     

    Obviously this is just an idea... I haven't written any code to do this ... and if you're not a .NET developer, then it can seem very very daunting!

    I don't think this is something that is required in the Umbraco core, but is specific to your problem (which many others would probably find useful).

    Cheers, Lee.

  • Lee Kelleher 4026 posts 15836 karma points MVP 13x admin c-trib
    Mar 01, 2010 @ 06:47
    Lee Kelleher
    5

    Hi stc,

    I couldn't help myself!  I've gone with the Regular Expression approach - only to keep all the code self-contained in this snippet... and within the .NET framework.  Personally I'd go with Html Agility Pack, but that's too much effort (explaining references, etc) for this code snippet.

    namespace Bodenko.Umbraco.ApplicationEvents
    {
        using System;
        using System.Text.RegularExpressions;
        using umbraco.BusinessLogic;
        using umbraco.cms.businesslogic;
        using umbraco.cms.businesslogic.property;
        using umbraco.cms.businesslogic.web;
    
        public class ExtractImageAssignProperty : ApplicationBase
        {
            public ExtractImageAssignProperty()
            {
                Document.BeforePublish += new Document.PublishEventHandler(Document_BeforePublish);
            }
    
            void Document_BeforePublish(Document sender, PublishEventArgs e)
            {
                try
                {
                    // get the article property from the document
                    Property bodyText = sender.getProperty("article");
    
                    // check that the property exists
                    if (bodyText != null && bodyText.Value != null)
                    {
                        // grab the value
                        String html = bodyText.Value.ToString();
    
                        // set the regular expressions
                        Regex regImages = new Regex(@"<img\s[^>]*>", RegexOptions.IgnoreCase);
                        Regex regSrc = new Regex(@"src=(?:(['""])(?<src>(?:(?!\1).)*)\1|(?<src>[^\s>]+))", RegexOptions.IgnoreCase | RegexOptions.Singleline);
    
                        // get the matches from the regular expressions
                        MatchCollection images = regImages.Matches(html);
    
                        // if it has any matches, then continue
                        if (images.Count > 0)
                        {
                            // loop through each of the image matches (we can't assume the first one is valid)
                            foreach (Match image in images)
                            {
                                // check if it has a 'src' attribute
                                if (regSrc.IsMatch(image.Groups[0].Value))
                                {
                                    // get the 'src' attribute
                                    Match src = regSrc.Match(image.Groups[0].Value);
    
                                    // check if the 'src' attribute has a value
                                    if (!String.IsNullOrEmpty(src.Groups["src"].Value))
                                    {
                                        // grab the value (which should be the image URL)
                                        String url = src.Groups["src"].Value;
    
                                        // get the image property from the document
                                        Property docImage = sender.getProperty("image");
    
                                        // check that the property exists
                                        if (docImage != null)
                                        {
                                            // assign the image URL to the document property.
                                            docImage.Value = url;
    
                                            // since we are only interested in the first image tag,
                                            // break out of the foreach loop
                                            break;
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
                catch
                {
                    // if we catch an exception - we still want the document to be published
                    // and we don't want a YSoD - so handle however you prefer here. (i.e. ELMAH or other logging)
                }
            }
        }
    }

    I haven't tested this in any way - it should work ... but I'd suggest that you test it out on a dev site/server first!!! (that is if you want to try it out? Feel free to say no).

    For anyone else who finds this code useful... then WTFPL applies nicely! ;-)

    Cheers, Lee.

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jun 04, 2010 @ 11:53
    Ismail Mayat
    1

    Lee,

    Nice solution a quick suggestion maybe overkill but will reduce the size of the image extraction method, you could load the into htmlagility kit and xpath it out.

    Regards

    Ismail

  • Lee Kelleher 4026 posts 15836 karma points MVP 13x admin c-trib
    Jun 04, 2010 @ 12:02
    Lee Kelleher
    1

    Hi Ismail, I mention HTML Agility Pack just before the code snippet. ;-)

    I used RegEx in the snippet as an example and self-contained within the .NET framework!  But yes, HTML Agility Pack is awesome for this kind of thing!

    Cheers, Lee.

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jun 04, 2010 @ 16:05
    Ismail Mayat
    0

    Lee,

    Doh need to read things properly LOL!

Please Sign in or register to post replies

Write your reply to:

Draft