how to access a media file on azure blob storage directly not via http url

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Heather Floyd 610 posts 1033 karma points MVP 6x c-trib

Jun 21, 2023 @ 16:59

0

How to access a media file on Azure Blob Storage directly? (Not via http URL)

Hello friends,

I have some code which runs on a media file "Save" event that extracts the content from a PDF and stores it in a text field on the Media item for use in searching, etc. This is using TikaOnDotNet.TextExtraction.TextExtractor (see: https://github.com/KevM/tikaondotnet#usage)

When running locally, this code works great - it can access the PDF file via the /Media/ folder and read it.

The site is hosted on Umbraco Cloud, which uses Azure Blob Storage for media, so if the file cannot be located in the Media folder (which is generally empty on Cloud sites), it uses the URL of the media, and grabs it via new WebClient().DownloadData(uri). If the Media file is Added/Saved on the Live environment, this works just fine, since the URI is publicly accessible, however, if it is added on a Development or Staging environment, it fails because those environments are protected via Basic Auth.

Can anyone recommend a way to read a media file from Azure Blob Storage on those protected environments?

Copy Link

Heather Floyd 610 posts 1033 karma points MVP 6x c-trib

Jun 21, 2023 @ 18:10

Thanks to help from Nik Rimington and Anders Bjerner, I was able to find a solution utilizing Umbraco's IMediaFileSystem.

Stripped-down example using Dependency Injection:

using Umbraco.Core.IO;
private readonly IMediaFileSystem _mediaFileSystem;
...

// Open a stream for reading the file contents
using (var fs = _mediaFileSystem.OpenFile(mediaUmbracoFile))
{
    if (fs != null)
    {
        var fileTextContent = mediaParser.ParseMediaText(fs,
            out extractedMetaFromTika);
        ...
    }
    else
    {
        _iLogger.Error(typeof(RegisterEventsComponent),
            new Exception($"Unable to open PDF file {fileInfo.FullName}"),
            "Unable to Open PDF file");
    }
}
   ...

public string ParseMediaText(Stream SourceStream, out Dictionary<string, string> MetaData)
{
    var sb = new StringBuilder();
    var metaData = new Dictionary<string, string>();
    var textExtractor = new TextExtractor();
    try
    {
        using (var memoryStream = new MemoryStream())
        {
            SourceStream.CopyTo(memoryStream);
            var streamBytes = memoryStream.ToArray();

            var textExtractionResult = textExtractor.Extract(streamBytes);
            sb.Append(textExtractionResult.Text);
            metaData = (Dictionary<string, string>)textExtractionResult.Metadata;
        }
    }
    catch (Exception ex)
    {
        var msg = $"MediaParserService.ParseMediaText: Could not read media item provided by stream";
        throw new Exception(msg, ex);
    }
    MetaData = metaData;
    return sb.ToString();
}

Additionally, the tip for v9+ implementation is to look at https://github.com/umbraco/UmbracoExamine.PDF/blob/v11/dev/src/UmbracoExamine.PDF/PdfPigTextExtractor.cs

Copy Link

Gurumurthy 56 posts 129 karma points

Aug 15, 2023 @ 23:26

0

HI,

This is part of private readonly IMediaFileSystem _mediaFileSystem Umbraco v8 right, how can we get teh same in Umbraco 11.

Basically to get the media full path which is stored in Azure blob.

Thanks,

Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies

Flag this post as spam?

How to access a media file on Azure Blob Storage directly? (Not via http URL)