ASP.NET Web API file download service with resume support

ASP.NET Web API provides out of the box support for streaming binary files to the client. However for more advanced scenarios you need to add custom logic to handle pause/resume functionality by handling appropriate HTTP Headers. In this post I will try to address this problem and build a resume-supporting file download service using two different approaches:

  • stream wrapper for FileStream that can return partial data,
  • memory mapped files.
Memory mapped files seem to be an interesting candidate as they may offer performance benefits such as memory caching and optimized file access managed by virtual memory manager.

 

Simple file download service

Thanks to StreamContent class, creating basic file download service in ASP.NET Web API is a relatively straightforward task. Let's start by implementing a basic scenario, where files are served from a directory.
Instead of dealing with file system access directly in controllers I usually like to encapsulate this functionality in a dedicated object, which makes unit testing/mocking easier and makes code tidier. For our examples we will create IFileProvider interface that exposes three operations:

public interface IFileProvider  
{
    bool Exists(string name);
    FileStream Open(string name);
    long GetLength(string name);
}

The actual implementation will use app settings in web.config file to configure storage folder location.

public class FileProvider : IFileProvider  
{
    private readonly string _filesDirectory;
    const string DefaultFileLocation = "Files";
    private const string AppSettingsKey = "FileProvider.FilesLocation";

    public FileProvider()
    {
        _filesDirectory = DefaultFileLocation;
        var fileLocation = ConfigurationManager.AppSettings[AppSettingsKey];
        if(!String.IsNullOrWhiteSpace(fileLocation))
        {
            _filesDirectory = fileLocation;
        }
    }

    public bool Exists(string name)
    {
        //make sure we dont access directories outside of our store for security reasons
        string file = Directory.GetFiles(_filesDirectory, name, SearchOption.TopDirectoryOnly)
                .FirstOrDefault();
        return file != null;
    }

    public FileStream Open(string name)
    {
        return File.Open(GetFilePath(name), 
            FileMode.Open, FileAccess.Read);
    }

    public long GetLength(string name)
    {
        return new FileInfo(GetFilePath(name)).Length;
    }

    private string GetFilePath(string name)
    {
        return Path.Combine(_filesDirectory, name);
    }
}

 

<appSettings>  
    <!-- (...) -->
    <add key="FileProvider.FilesLocation" value="H:\Storage" />
</appSettings>

With file access logic ready we can write code that actually serves the data. A simple Web API controller that streams files will look like this:

public class SimpleFilesController : ApiController  
{
    public IFileProvider FileProvider { get; set; }

    public SimpleFilesController()
    {
        FileProvider = new FileProvider();
    }

    public HttpResponseMessage Get(string fileName)
    {
        if (!FileProvider.Exists(fileName))
        {
            throw new HttpResponseException(HttpStatusCode.NotFound);
        }

        FileStream fileStream = FileProvider.Open(fileName);
        var response = new HttpResponseMessage();
        response.Content = new StreamContent(fileStream);
        response.Content.Headers.ContentDisposition
            = new ContentDispositionHeaderValue("attachment");
        response.Content.Headers.ContentDisposition.FileName = fileName;
        response.Content.Headers.ContentType
            = new MediaTypeHeaderValue("application/octet-stream");
        response.Content.Headers.ContentLength 
                = FileProvider.GetLength(fileName);
        return response;
    }
}

It is a basic version, yet it seems to work fine. If you wanted to use it in more advanced scenarios however, there are a couple of potential problems to face.
First of all when the transfer is interrupted for whatever reason, the client has to start downloading from the beginning. This is unacceptable when serving large files and would be a major annoyance for people using mobile connections that drop often. Another problem is that the implementation above is not very client friendly in terms of http support (eg. HEAD verb).

Adding resume support

There are two main areas that we need to add more logic to in order to introduce pause/resume functionality:

  • extend HTTP protocol support - most importantly by handling Range header properly,
  • use a Stream that is capable of  returning file portion (from byte A to byte B).
Why should we implement HEAD verb in the controller? Let's imagine we were to write software that downloads large files over HTTP using our service. Ideally we would like to have a mechanism that could tell us how big is the file (by returning Content-Length header) and whether or not the service can serve us partial data (by returning Accept-Ranges header) without actually getting the data. This is exactly what HEAD does as it is designed to be identical to GET except that the server must not return the body (headers only).
Accept-Ranges is returned by the server in order to indicate that it can return bytes ranges of a requested resource. Moreover if partial content has been returned, the server should return 206 Partial Content status code along with Content-Range header that contains ranges returned. If the client requests range that is out of bounds for a given resource 416 Requested Range Not Satisfiable status should be returned.
Here is an example added for clarity.
HEAD http://localhost/Piotr.AspNetFileServer/api/files/data.zip HTTP/1.1  
User-Agent: Fiddler  
Host: localhost

HTTP/1.1 200 OK  
Content-Length: 1182367743  
Content-Type: application/octet-stream  
Accept-Ranges: bytes  
Server: Microsoft-IIS/8.0  
Content-Disposition: attachment; filename=data.zip
HEAD http://localhost/Piotr.AspNetFileServer/api/files/data.zip HTTP/1.1  
User-Agent: Fiddler  
Host: localhost  
Range: bytes=0-999

HTTP/1.1 206 Partial Content  
Content-Length: 1000  
Content-Type: application/octet-stream  
Content-Range: bytes 0-999/1182367743  
Accept-Ranges: bytes  
Server: Microsoft-IIS/8.0  
Content-Disposition: attachment; filename=data.zip

This is a helper class used to store some information passed in HTTP headers.

public class ContentInfo  
{
    public long From;
    public long To;
    public bool IsPartial;
    public long Length;
}

The controller itself can look like this:

public class FilesController : ApiController  
{
    public IFileProvider FileProvider { get; set; }

    public FilesController()
    {
        FileProvider = new FileProvider();
    }

    public HttpResponseMessage Head(string fileName)
    {
        if (!FileProvider.Exists(fileName))
        {
            //if file does not exist return 404
            throw new HttpResponseException(HttpStatusCode.NotFound);
        }
        long fileLength = FileProvider.GetLength(fileName);
        ContentInfo contentInfo = GetContentInfoFromRequest(this.Request, fileLength);

        var response = new HttpResponseMessage();
        response.Content = new ByteArrayContent(new byte[0]);
        SetResponseHeaders(response, contentInfo, fileLength, fileName);
        return response;
    }

    public HttpResponseMessage Get(string fileName)
    {
        if (!FileProvider.Exists(fileName))
        {
            //if file does not exist return 404
            throw new HttpResponseException(HttpStatusCode.NotFound);
        }
        long fileLength = FileProvider.GetLength(fileName);
        ContentInfo contentInfo 
           = GetContentInfoFromRequest(this.Request, fileLength);
        var stream = new PartialReadFileStream(FileProvider.Open(fileName), 
                                               contentInfo.From, contentInfo.To);
        var response = new HttpResponseMessage();
        response.Content = new StreamContent(stream);
        SetResponseHeaders(response, contentInfo, fileLength, fileName);
        return response;
    }

    private ContentInfo GetContentInfoFromRequest(HttpRequestMessage request, long entityLength)
    {
        var result = new ContentInfo 
                    {
                        From = 0, To = entityLength - 1, 
                        IsPartial = false, Length = entityLength
                    };
        RangeHeaderValue rangeHeader = request.Headers.Range;
        if (rangeHeader != null && rangeHeader.Ranges.Count != 0)
        {
            //we support only one range
            if (rangeHeader.Ranges.Count > 1)
            {
                //we probably return other status code here
                throw new HttpResponseException(HttpStatusCode.RequestedRangeNotSatisfiable);
            }
            RangeItemHeaderValue range = rangeHeader.Ranges.First();
            if (range.From.HasValue && range.From < 0 
                || range.To.HasValue && range.To > entityLength - 1)
            {
                throw new HttpResponseException(HttpStatusCode.RequestedRangeNotSatisfiable);
            }

            result.From = range.From ?? 0;
            result.To = range.To ?? entityLength - 1;
            result.IsPartial = true;
            result.Length = entityLength;
            if (range.From.HasValue && range.To.HasValue)
            {
                result.Length = range.To.Value - range.From.Value + 1;
            }
            else if (range.From.HasValue)
            {
                result.Length = entityLength - range.From.Value + 1;
            }
            else if (range.To.HasValue)
            {
                result.Length = range.To.Value + 1;
            }
        }

        return result;
    }

    private void SetResponseHeaders(HttpResponseMessage response, ContentInfo contentInfo,
                                    long fileLength, string fileName)
    {
        response.Headers.AcceptRanges.Add("bytes");
        response.StatusCode = contentInfo.IsPartial ? HttpStatusCode.PartialContent
                                  : HttpStatusCode.OK;
        response.Content.Headers.ContentDisposition 
          = new ContentDispositionHeaderValue("attachment");
        response.Content.Headers.ContentDisposition.FileName 
          = fileName;
        response.Content.Headers.ContentType 
          = new MediaTypeHeaderValue("application/octet-stream");
        response.Content.Headers.ContentLength = contentInfo.Length;
        if (contentInfo.IsPartial)
        {
            response.Content.Headers.ContentRange
                = new ContentRangeHeaderValue(contentInfo.From, contentInfo.To, fileLength);
        }
    }
}

Another important part of this solution is a stream implementation that can return byte range from a file. This is actually a wrapper for FileStream, please note that this code is largely untested, although gives an idea about approach - you have been warned ;)

internal class PartialReadFileStream : Stream  
{
    private readonly long _start;
    private readonly long _end;
    private long _position;
    private FileStream _fileStream;
    public PartialReadFileStream(FileStream fileStream, long start, long end)
    {
        _start = start;
        _position = start;
        _end = end;
        _fileStream = fileStream;

        if (start > 0)
        {
            _fileStream.Seek(start, SeekOrigin.Begin);
        }
    }

    public override void Flush()
    {
        _fileStream.Flush();
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        if (origin == SeekOrigin.Begin)
        {
            _position = _start + offset;
            return _fileStream.Seek(_start + offset, origin);
        }
        else if (origin == SeekOrigin.Current)
        {
            _position += offset;
            return _fileStream.Seek(_position + offset, origin);
        }
        else
        {
            throw new NotImplementedException("Seeking from SeekOrigin.End is not implemented");
        }
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        int byteCountToRead = count;
        if (_position + count > _end)
        {
            byteCountToRead = (int)(_end - _position) + 1;
        }
        var result = _fileStream.Read(buffer, offset, byteCountToRead);
        _position += byteCountToRead;
        return result;
    }

    public override IAsyncResult BeginRead(byte[] buffer, int offset, int count,
       AsyncCallback callback, object state)
    {
        int byteCountToRead = count;
        if (_position + count > _end)
        {
            byteCountToRead = (int)(_end - _position);
        }
        var result = _fileStream.BeginRead(buffer, offset,
                           count, (s) =>
                                      {
                                          _position += byteCountToRead;
                                          callback(s);
                                      }, state);
        return result;
    }

    public override int EndRead(IAsyncResult asyncResult)
    {
        return _fileStream.EndRead(asyncResult);
    }

    public override int ReadByte()
    {
        int result = _fileStream.ReadByte();
        _position++;
        return result;
    }

    // ...

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            _fileStream.Dispose();
        }
        base.Dispose(disposing);
    }
}

If  you think about this performance-wise, its not the most optimal approach as every time a file is being requested we need  to read it from the disk and disks are very slow (compared to RAM) and disk access may become bottleneck very fast. It becomes evident that for more advanced scenarios some kind of a caching mechanism would be a good optimization.

Using memory-mapped files

Memory mapped file is a portion of virtual memory that has been mapped to a file. This is not a new concept and has been around in Windows (and other OSes) for many years, but just recently (from NET 4 that is) has been made available to C# programmers as a managed API. Memory mapped files allow processes to modify and read files as if they were reading and writing to the memory. If my memory serves me well IPC in Windows is actually implemented using this feature.

Memory mapped files

Please note that the files are mapped and not copied into virtual memory, but from program's perspective its transparent as Windows loads parts of physical files as they are accessed by application. Another advantage of MMF is that the system performs transfers in 4K chunks of data (pages) and virtual-memory manager (VMM) decides when it should free those pages up. Windows is highly optimized for page-related IO operations, and it tries to minimize the number of times the hard disk head has to move. In other words by using MMF you have a guarantee that the OS will optimize disk access and additionally you get a form of memory cache.

Because files are mapped to virtual memory, to serve big files we need to run our application in 64 bit mode, otherwise it wouldn't be able to address all space needed.For this example, make sure to change target platform to x64 in project properties.

public class MemMappedFilesController : ApiController  
{
    private const string MapNamePrefix = "FileServerMap";

    public IFileProvider FileProvider { get; set; }

    public MemMappedFilesController()
    {
        FileProvider = new FileProvider();
    }

    private ContentInfo GetContentInfoFromRequest(HttpRequestMessage request, long entityLength)
    {
        //...
    }

    private void SetResponseHeaders(HttpResponseMessage response, ContentInfo contentInfo,
        long fileLength, string fileName)
    {
        //...
    }

    public HttpResponseMessage Head(string fileName)
    {
        //string fileName = GetFileName(name);
        if (!FileProvider.Exists(fileName))
        {
            //if file does not exist return 404
            throw new HttpResponseException(HttpStatusCode.NotFound);
        }
        long fileLength = FileProvider.GetLength(fileName);
        ContentInfo contentInfo = GetContentInfoFromRequest(this.Request, fileLength);

        var response = new HttpResponseMessage();
        response.Content = new ByteArrayContent(new byte[0]);
        SetResponseHeaders(response, contentInfo, fileLength, fileName);
        return response;
    }

    public HttpResponseMessage Get(string fileName)
    {
        if (!FileProvider.Exists(fileName))
        {
            //if file does not exist return 404
            throw new HttpResponseException(HttpStatusCode.NotFound);
        }
        long fileLength = FileProvider.GetLength(fileName);
        ContentInfo contentInfo = GetContentInfoFromRequest(this.Request, fileLength);
        string mapName = GenerateMapNameFromName(fileName);

        MemoryMappedFile mmf = null;
        try
        {
            mmf = MemoryMappedFile.OpenExisting(mapName, MemoryMappedFileRights.Read);
        }
        catch (FileNotFoundException)
        {
            //every time we use an exception to control flow a kitten dies

            mmf = MemoryMappedFile
                .CreateFromFile(FileProvider.Open(fileName), mapName, fileLength,
                                MemoryMappedFileAccess.Read, null, HandleInheritability.None,
                                false);
        }
        using (mmf)
        {
            Stream stream
                = contentInfo.IsPartial
                ? mmf.CreateViewStream(contentInfo.From, 
                contentInfo.Length, MemoryMappedFileAccess.Read)
                : mmf.CreateViewStream(0, fileLength, 
                MemoryMappedFileAccess.Read);

            var response = new HttpResponseMessage();
            response.Content = new StreamContent(stream);
            SetResponseHeaders(response, contentInfo, fileLength, fileName);
            return response;
        }
    }

    private string GenerateMapNameFromName(string fileName)
    {
        return String.Format("{0}_{1}", MapNamePrefix, fileName);
    }
}

I've removed code that is identical to FilesController. Please note that we have 1-1 relationship between a file (or its name to be more precise) and a map name. This means we use same map for all requests asking for the same filename.

Both controllers should provide pause/resume function.

Hope you find this post useful, complete source code is available as usually on bitbucket. Enjoy!

comments powered by Disqus