10 October 2015

Archiving files in SQL Server as BLOB

In one of my projects, we had requirement of archiving text and excel files in database. We discovered that this can be done as BLOB fields. BLOB stands for Binary Large Objects. This data-type is available in most of the modern databases. In SQL Server 2005 - VARCHAR (MAX), NVARCHAR (MAX) and VARBINARY (MAX) data-types provides this functionality.

For achieving our goal, we had to take care of following things:-

  1. Designing a table structure to store file metadata and file contents.
  2. Writing stored procedures to insert and retrieve data from archival table.
  3. Writing .NET application code which can store files and retrieve them when needed.
Designing a table structure to store files

First we need a table to store our archived data. Let’s name this table as “tblFileArchival”. The structure of table is shown below:-











The table has following fields:-

  1. ArchivalID – Auto incrementing (identity) integer field.
  2. ArchivalDateTime – Field for storing date and time of file archival
  3. ArchivalFileName – Field for storing name of file
  4. ArchivedFileSource – Field for storing source of file. This can be an optional field for your requirement. Hence it can be marked as Nullable if needed.
  5. ArchivedData – Field for storing actual file data. Please note that data type for this field is varbinary (max).
  6. ArchivedFileExtn – Extension of stored file. (Eg.- xls, txt)
  7. ArchivedFileIODirection – In our application, we have to store whether the file is an input to our system or output from our system. Hence this field was used. This may not be required for your requirement. Hence it can be marked as Nullable if needed.
The SQL for creating the table is as follows:-

CREATE TABLE [dbo].[tblFileArchival](
      [ArchivalID] [int] IDENTITY(1,1) NOT NULL,
      [ArchivalDateTime] [datetime] NOT NULL,
      [ArchivedFileName] [nvarchar](100) NOT NULL,
      [ArchivedFileSource] [nvarchar](15) NOT NULL,
      [ArchivedData] [varbinary](max) NOT NULL,
      [ArchivedFileExtn] [nvarchar](4) NOT NULL,
      [ArchivedFileIODirection] [nvarchar](6) NOT NULL,
 CONSTRAINT [PK_tblFileArchival] PRIMARY KEY CLUSTERED
(
      [ArchivalID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Stored Procedure to insert data into table

We need a stored procedure to insert data into table. This procedure will take following input parameters:-

  • Name of file to be archived.
  • Source of file to be archived.
  • Data of file.
  • Extension of file to be archived.
  • File IO direction.

The SQL for stored procedure is as follows:-

CREATE PROCEDURE [dbo].[uspArchiveFile]
      (
      @ArchivedFileName nvarchar(100),
      @ArchivedFileSource nvarchar(15),
      @ArchivedData varbinary(max),
      @ArchivedFileExtn nvarchar(4),
      @ArchivedFileIODirection nvarchar(6)
      )
AS
BEGIN

INSERT INTO [tblFileArchival]
           ([ArchivalDateTime]
           ,[ArchivedFileName]
           ,[ArchivedFileSource]
           ,[ArchivedData]
           ,[ArchivedFileExtn]
           ,[ArchivedFileIODirection])
     VALUES
           (GETDATE()
           ,@ArchivedFileName
           ,@ArchivedFileSource
           ,@ArchivedData
           ,@ArchivedFileExtn
           ,@ArchivedFileIODirection);
END

Stored Procedure to retrieve data from table

We need a stored procedure to retrieve archived file metadata and file data from the table. This stored procedure will take one input parameter - “ArchivalID” of the file whose data needs to be retrieved and will return that data.

The SQL for stored procedure is as follows:-

CREATE PROCEDURE [dbo].[uspGetArchivedFile]
      @ArchivalId int
AS
BEGIN
      SELECT [ArchivalID]
              ,[ArchivalDateTime]
              ,[ArchivedFileName]
              ,[ArchivedFileSource]
              ,[ArchivedData]
              ,[ArchivedFileExtn]
              ,[ArchivedFileIODirection]
      FROM tblFileArchival
     WHERE [ArchivalID] = @ArchivalId
END

.NET Code to archive file in database

To archive a file in database in a field of type VARBINARY (MAX), we need to read the contents of file and pass them to our insert stored procedure as an array of bytes.

The following code shows how to achieve the same.

FileStream fileStream = null;

try
{
    //Open the file and read its contents in a byte array using file stream
    fileStream = File.OpenRead(FilePath);
    int length = (int)fileStream.Length;
    byte[] byteArray = new byte[length];

    fileStream.Read(byteArray, 0, length);

    //Place the file details and content in database
    ArchiveDataToDB(FileName,
                    ReceivedFrom,
                    byteArray,
                    FileExtn,
                    FileIODirection);
}
catch (Exception ex)
{
    throw ex;
}
finally
{
    if (fileStream != null)
    {
        fileStream.Close();
        fileStream = null;
    }
}

Here we have all the required file properties in following variables:-

  • FilePath
  • FileName
  • ReceivedFrom
  • FileExtn
  • FileIODirection

Before calling the stored procedure we need the contents of file also. This is achieved using following steps:-

  • Create a fileStream object and open the file using it.
  • Get the length of the file using fileStream.Length.
  • Create a byte array of size fileSream.Length.
  • Read the file from start till end in this byte array using fileStream.Read method.

Now call the function “ArchiveDataToDB” passing all file information as input to it. The function “ArchiveDataToDB” contains ADO.NET code which calls insert stored procedure – “uspArchiveFile” passing all input parameters and stores the data in SQL Server. For brevity, the code for function “ArchiveDataToDB” is not given as this contains generic ADO.NET code used for executing a stored procedure. This type of code is generally project specific.

.NET code to retrieve archived file from database

There is no use of archiving a file in database if we can’t retrieve it. For regenerating the file from archived data, we need to write custom .NET code. Following code shows how to achieve it:-

FileStream fileStream = null;
BinaryWriter binaryWriter = null;
MemoryStream memStream = null;
byte[] binary = null;
const int chunkSize = 100;
int sizeToWrite = 0;

try
{
    //Get archived file metadata(details) and content from database
    DataSet dsArchivedFile = GetArchivedFile(archivalId);
    DataTable dtArchivedFile = dsArchivedFile.Tables[0];

    //if datatable is empty, it means no such file exists in database
    if (dtArchivedFile.Rows.Count == 0)
    {
        return "No file exists with archival id - " + archivalId;
    }

    //Get file name and extension
    string archivedFileName = dtArchivedFile.Rows[0]["ArchivedFileName"].ToString();
    string archivedFileExtn = dtArchivedFile.Rows[0]["ArchivedFileExtn"].ToString();

    if (!filePathToStore.EndsWith(@"\"))
    {
        filePathToStore += @"\";
    }

    //Create file path where file needs to be created
    filePathToStore = filePathToStore + archivedFileName + "." + archivedFileExtn.ToLower();

    //Create the file stream
    fileStream = new FileStream(filePathToStore, FileMode.Create, FileAccess.Write);
    binaryWriter = new BinaryWriter(fileStream);

    //Get file contents from Database in a byte array
    binary = (byte[])dtArchivedFile.Rows[0]["ArchivedData"];
    sizeToWrite = chunkSize;

    memStream = new MemoryStream(binary);

    //using chunksize, read the byte array and write it to binary writer
    for (int i = 0; i < binary.GetUpperBound(0) - 1; i = i + chunkSize)
    {
        if (i + chunkSize >= binary.Length)
        {
            sizeToWrite = binary.Length - i;
        }

        byte[] chunk = new byte[sizeToWrite];
        memStream.Read(chunk, 0, sizeToWrite);
        binaryWriter.Write(chunk);
        binaryWriter.Flush();
    }

    return "File generated at " + filePathToStore;
}
catch (Exception ex)
{
    throw ex;
}
finally
{

    if (fileStream != null)
    {
        fileStream.Close();
        fileStream.Dispose();
    }

    if (binaryWriter != null)
    {
        binaryWriter.Close();
    }
}

First we get all file details including file data using the stored procedure “uspGetArchivedFile” we created earlier. This stored procedure is executed by the function “GetArchivedFile” which takes “archivalID” as input and uses ADO.NET code to execute the stored procedure and get file details & data in a dataset. For brevity, the code for function “GetArchivedFile” is not given as this contains generic ADO.NET code used for executing a stored procedure. This type of code is generally project specific.

To generate the file from dataset, following steps are taken:-

  • Check if dataset is empty. If yes, we return an error message else we retrieve various file properties like filename, fileExtn in appropriate variables.
  • Create file path where file needs to be stored. This is achieved by specifying directory and appending file name and file extension to it.
  • Create a fileStream object in File create mode and specify the path where file needs to be stored. Open the fileStream object in write mode.
  • Create a binary writer from the file stream object.
  • Get the file contents from dataset in a byte array.
  • Now read this byte array by chunk-size of 100 and write it to binary writer.

The file will be generated at the mentioned file path.

Conclusion

Storing file information and data in SQL server provides a secure, reliable and cost effective way to archive files. The file data can be easily managed. It is also secure because only users who have access to SQL server can reach the table where file data is stored. Since file data is stored in VARBINARY field, it is in hexadecimal format which cannot be interpreted by naked eyes. Hence this provides additional security for data.

No comments: