Showing posts with label iTextSharp. Show all posts
Showing posts with label iTextSharp. Show all posts

How to extract text from PDF file using iTextSharp with C#

In this tutorial, I am going to explain you how to extract text from PDF file using iTextSharp with C# in ASP.NET. Below is step by step tutorial.

Creating ASP.NET Empty Application

Create an ASP.NET Empty WebForm project as shown below.
Go to FileNewProject. A new window will be open as shown below.
Now go to WebVisual Studio 2012 → select .NET Framework 4.5 → select ASP.NET Empty Web Application and give project name and click on OK.

Creating asp.net 4.5 empty project

Now, an asp.net empty project will be created. Add a new webform to application.

Installing iTextSharp

Now the next step is to add iTextSharp reference to your application. We can add reference by two ways.
First: Download from Internet
Click on the below link to download the dll.
https://github.com/itext/itextsharp Once file is downloaded, extract it, now you will find 6 more .rar file. Again extract itextsharp-dll-core.rar file, after that add reference of itextsharp.dll to your project.
or Second: Nuget Package Manager
Go to TOOLS → Library Package Manager → Manage NuGet Packages for Solution.. and a new window will open. Type and search for iTextSharp and click on Install button as shown below. Once installed successfully, you can check iTextSharp in references folder.

Adding iTextSharp
Installing iTextSharp

You can also install by using Package Manager Console.
Go to TOOLS → Library Package Manager → Package Manager Console → write Install-Package iTextSharp and press enter. This will install iTextSharp in application.

In aspx file

In designer file create two button controls, first button is used to generate pdf file and second button is used to extract text from pdf file. One textbox control to display extracted text from pdf. Designer file look like as shown below.

Aspx designer file
<%@ Page Language="C#" AutoEventWireup="true" CodeBehind="WebForm1.aspx.cs" Inherits="WebApplication1.WebForm1" %>
 
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
<title></title>
</head>
<body>
<form id="form1" runat="server">
<div>
    <table>
        <tr>
            <td><b>Extract Text from PDF file using iTextSharp</b></td>
        </tr>
        <tr>
            <td>
                <asp:Button ID="btnGeneratePDF" runat="server" Text="Generate PDF File" OnClick="btnGeneratePDF_Click" />
            </td>
        </tr>
        <tr>
            <td>
                <asp:Button ID="btnExtract" runat="server" Text="Extract Text From PDF File" OnClick="btnExtract_Click" />
            </td>
        </tr>
        <tr>
            <td>
                <asp:TextBox ID="TextBox1" runat="server" TextMode="MultiLine" Style="width: 500px; min-height: 150px;"> 
                </asp:TextBox>
            </td>
        </tr>
    </table>
</div>
</form>
</body>
</html>

C# Code

Complete C# code is given below.

using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
 
namespace WebApplication1
{
public partial class WebForm1 : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
 
}
 
protected void btnGeneratePDF_Click(object sender, EventArgs e)
{
    if (File.Exists(Server.MapPath("Example.pdf")))
    {
        File.Delete(Server.MapPath("Example.pdf"));
    }
 
    // create pdf file and save it to the root directory of the application 
    FileStream fs = new FileStream(Server.MapPath("Example.pdf"), FileMode.Create);
 
    Document doc = new Document();
 
    PdfWriter.GetInstance(doc, fs);
 
    doc.Open();
 
    Paragraph page = new Paragraph("This is first page (page number 1)");
    doc.Add(page);
 
    Paragraph para1 = new Paragraph();
    Chunk c1 = new Chunk(@"This is first paragraph. This is first paragraph. This is first paragraph. This is first paragraph. This is first paragraph. This is first paragraph. This is first paragraph. This is first paragraph. This is first paragraph.");
    c1.SetBackground(BaseColor.YELLOW);
    para1.Add(c1);
    doc.Add(para1);
 
    Paragraph para2 = new Paragraph();
    Chunk c2 = new Chunk(@"This is second paragraph. This is second paragraph. This is second paragraph. This is second paragraph. This is second paragraph. This is second paragraph. This is second paragraph. This is second paragraph. This is second paragraph.");
    c2.SetBackground(BaseColor.GREEN);
    para2.Add(c2);
    doc.Add(para2);
 
    doc.Close();
}
 
protected void btnExtract_Click(object sender, EventArgs e)
{
    //string FilePath = @"H:\\Demo\\WebApplication1\\WebApplication1\\Example.pdf";
 
    string FilePath = Server.MapPath("Example.pdf");
 
    if (File.Exists(FilePath))
    {
        string ExtractedData = string.Empty;
 
        using (PdfReader reader = new PdfReader(FilePath))
        {
            ITextExtractionStrategy strategy = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
 
            // 1. if pdf document has only one page
            //here second parameter is PDF Page number
            ExtractedData = PdfTextExtractor.GetTextFromPage(reader, 1, strategy);
 
 
            /*// 2. if pdf ducument has more than one page
            // iterating through all pages
            for (int i = 1; i <= reader.NumberOfPages; i++)
            {
                ExtractedData = PdfTextExtractor.GetTextFromPage(reader, i, strategy);
            }*/
 
 
            /*// if pdf single page is having more than one paragraph
            // then split paragraph using newline
            ExtractedData = PdfTextExtractor.GetTextFromPage(reader, 1, strategy);
            string[] lines = ExtractedData.Split('\n');
            StringBuilder sb = new StringBuilder();
            foreach (string line in lines)
            {
                // 
            }*/
 
        }
        TextBox1.Text = ExtractedData;
    }
}
}
}

When you click on the Generate PDF File button, a PDF will be generated and will be saved at root directory of application. When you open pdf file, you will see 3 paragraph as shown below.

PDF file generated using iTextSharp

Now when you click on Extract Text From PDF File, all the text from page one will be extracted and displayed to the TextBox. You can iterate through all the pages using foor loop. Code is added and commented above.

Extract text from PDF using iTextSharp

Generate Pay slip / Salary slip PDF using iTextSharp in ASP.NET

In this article, I am going to explain you how to generate employee pay slip or salary slip in PDF format using itextsharp in asp.net . I will be using visual studio 2013 professional. Before continuing this article, let's see the screen-shots.

When the page is loaded for the first time then you will see the below screen. You will select employee id and month and click on generate button to download salary slip. I have hard-coded some employee id and month in dropdownlist control in aspx file.

Generate salary slip using itextsharp in asp.net

I have shared pdf file sample as shown below in iframe.

So, below is the step by step tutorial.

Creating table and procedure

Below is the scripts to create 2 tables tbl_EmployeeDetails and tbl_SalaryDetails for employee details and salary details.

Export DataTable to PDF file using iTextSharp and download/transmit at client machine

In this article, I am going to explain you how to export DataTable to PDF file using iTextSharp in C# and download or transmit at client machine. First, you need to download iTextSharp dll from the internet. Click on the below link to download the dll.
https://github.com/itext/itextsharp

Once file is downloaded, extract it, now you will find 6 more .rar file. Again extract itextsharp-dll-core.rar file, after that add reference of itextsharp.dll to your project.

Related Article

  1. How to export GridView data into PDF using iTextSharp in asp.net with C#
  2. Insert an image into PDF using iTextSharp with C# (C-Sharp)
  3. How to add meta information of PDF file using iTextSharp with C-Sharp
  4. How to extract images from a pdf file using C#.Net

In Code-Behind File

Add below nampespaces.

using System.Data;
using iTextSharp.text;
using iTextSharp.text.pdf;

C# Code Snippet

Below is complete C# code to generate pdf file using dummy data table.

protected void Page_Load(object sender, EventArgs e)
{
    if(!IsPostBack)
    {
        ExportDataTableToPdfandDownloadAtClient();
    }
}

private void ExportDataTableToPdfandDownloadAtClient()
{
    // creating datatable and adding dumy data
    DataTable dtEmployee = new DataTable();
    dtEmployee.Columns.Add("EmpId", typeof(Int32));
    dtEmployee.Columns.Add("Name", typeof(string));
    dtEmployee.Columns.Add("Gender", typeof(string));
    dtEmployee.Columns.Add("Salary", typeof(Int32));
    dtEmployee.Columns.Add("Country", typeof(string));
    dtEmployee.Rows.Add(1, "Rahul", "Male", 60000, "India");
    dtEmployee.Rows.Add(2, "John", "Male", 50000, "USA");
    dtEmployee.Rows.Add(3, "Mary", "Female", 75000, "UK");
    dtEmployee.Rows.Add(4, "Mathew", "Male", 80000, "Australia");

    // creating document object
    iTextSharp.text.Rectangle rec = new iTextSharp.text.Rectangle(PageSize.A4);
    rec.BackgroundColor = new BaseColor(System.Drawing.Color.Olive);
    Document doc = new Document(rec);
    doc.SetPageSize(iTextSharp.text.PageSize.A4);
    PdfWriter writer = PdfWriter.GetInstance(doc, Response.OutputStream);
    doc.Open();

    //Creating paragraph for header
    BaseFont bfntHead = BaseFont.CreateFont(BaseFont.TIMES_ROMAN, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
    iTextSharp.text.Font fntHead = new iTextSharp.text.Font(bfntHead, 16, 1, iTextSharp.text.BaseColor.ORANGE);
    Paragraph prgHeading = new Paragraph();
    prgHeading.Alignment = Element.ALIGN_LEFT;
    prgHeading.Add(new Chunk("Employee Details".ToUpper(), fntHead));
    doc.Add(prgHeading);

    //Adding paragraph for report generated by
    Paragraph prgGeneratedBY = new Paragraph();
    BaseFont btnAuthor = BaseFont.CreateFont(BaseFont.TIMES_ROMAN, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
    iTextSharp.text.Font fntAuthor = new iTextSharp.text.Font(btnAuthor, 8, 2, iTextSharp.text.BaseColor.BLUE);
    prgGeneratedBY.Alignment = Element.ALIGN_RIGHT;
    prgGeneratedBY.Add(new Chunk("Report Generated by : ASPArticles", fntAuthor));
    prgGeneratedBY.Add(new Chunk("\nGenerated Date : " + DateTime.Now.ToShortDateString(), fntAuthor));
    doc.Add(prgGeneratedBY);

    //Adding a line
    Paragraph p = new Paragraph(new Chunk(new iTextSharp.text.pdf.draw.LineSeparator(0.0F, 100.0F, iTextSharp.text.BaseColor.BLACK, Element.ALIGN_LEFT, 1)));
    doc.Add(p);

    //Adding line break
    doc.Add(new Chunk("\n", fntHead));

    //Adding  PdfPTable
    PdfPTable table = new PdfPTable(dtEmployee.Columns.Count);

    for (int i = 0; i < dtEmployee.Columns.Count; i++)
    {
        string cellText = Server.HtmlDecode(dtEmployee.Columns[i].ColumnName);
        PdfPCell cell = new PdfPCell();
        cell.Phrase = new Phrase(cellText, new iTextSharp.text.Font(iTextSharp.text.Font.FontFamily.TIMES_ROMAN, 10, 1, new BaseColor(System.Drawing.ColorTranslator.FromHtml("#ffffff"))));
        cell.BackgroundColor = new BaseColor(System.Drawing.ColorTranslator.FromHtml("#990000"));
        //cell.Phrase = new Phrase(cellText, new Font(Font.FontFamily.TIMES_ROMAN, 10, 1, new BaseColor(grdStudent.HeaderStyle.ForeColor)));
        //cell.BackgroundColor = new BaseColor(grdStudent.HeaderStyle.BackColor);
        cell.HorizontalAlignment = Element.ALIGN_CENTER;
        cell.PaddingBottom = 5;
        table.AddCell(cell);
    }

    //writing table Data
    for (int i = 0; i < dtEmployee.Rows.Count; i++)
    {
        for (int j = 0; j < dtEmployee.Columns.Count; j++)
        {
            table.AddCell(dtEmployee.Rows[i][j].ToString());
        }
    }

    doc.Add(table);
    doc.Close();
    writer.Close();
    Response.ContentType = "application/pdf";
    Response.AddHeader("content-disposition", "attachment;" + "filename=EmployeeDetails.pdf");
    Response.Cache.SetCacheability(HttpCacheability.NoCache);
    Response.Write(doc);
    Response.End();
}

Below is the pdf file will be downloaded at client machine.

Export datatable to pdf file using itextsharp


DOWNLOAD SOURCE CODE

How to extract images from a pdf file using C#.Net

How to extract images from a pdf file using C#.Net

In this article, we are going to learn how to extract images from PDF file using itextsharp in asp.net with C#. First, you need to download iTextSharp dll from the internet. Click on the below link to download the dll.

https://github.com/itext/itextsharp

Related Article

  1. How to generate PDF file using iTextSharp in C#
  2. How to export GridView data into PDF using iTextSharp in asp.net with C#
  3. Insert an image into PDF using iTextSharp with C# (C-Sharp)
  4. How to add meta information of PDF file using iTextSharp with C-Sharp

Once file is downloaded, extract it, now you will find 6 more .rar file. Again extract itextsharp-dll-core.rar file, after that add reference of itextsharp.dll to your project.

In Code-Behind File

Add below nampespaces.

using System.IO;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

Complete C# Code

namespace WebApplication1
{
    public partial class WebForm1 : System.Web.UI.Page
    {
        protected void Page_Load(object sender, EventArgs e)
        {
            if (!IsPostBack)
            {
                ExtractImage();
            }
        }

        public void ExtractImage()
        {
            // existing pdf path
            PdfReader reader = new PdfReader("E:/Example.pdf");
            PRStream pst;
            PdfImageObject pio;
            PdfObject po;
            // number of objects in pdf document
            int n = reader.XrefSize;
            FileStream fs = null;
            // set image file location
            String path = "E:/";
            for (int i = 0; i < n; i++)
            {
                // get the object at the index i in the objects collection
                po = reader.GetPdfObject(i);
                // object not found so continue
                if (po == null || !po.IsStream())
                    continue;
                //cast object to stream
                pst = (PRStream)po;
                //get the object type
                PdfObject type = pst.Get(PdfName.SUBTYPE);
                //check if the object is the image type object
                if (type != null && type.ToString().Equals(PdfName.IMAGE.ToString()))
                {
                    //get the image
                    pio = new PdfImageObject(pst);
                    fs = new FileStream(path + "image" + i + ".jpg", FileMode.Create);
                    //read bytes of image in to an array
                    byte[] imgdata = pio.GetImageAsBytes();
                    //write the bytes array to file
                    fs.Write(imgdata, 0, imgdata.Length);
                    fs.Flush();
                    fs.Close();
                }
            }
        }
    }
}

How to add meta information of PDF file using iTextSharp with C-Sharp

In this article, we are going to learn how to add meta information of PDF file using itextsharp in asp.net with C#. First, you need to download iTextSharp dll from the internet. Click on the below link to download

https://github.com/itext/itextsharp

Once file is downloaded, extract it, now you will find 6 more .rar file. Again extract itextsharp-dll-core.rar file, after that add reference of itextsharp.dll to your project.

Related Article

  1. How to generate PDF file using iTextSharp in C#
  2. How to export GridView data into PDF using iTextSharp in asp.net with C#
  3. Insert an image into PDF using iTextSharp with C# (C-Sharp)
  4. How to extract images from a pdf file using C#.Net

In Code-Behind File

Add below nampespaces.

using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;

Complete C# code:

protected void Page_Load(object sender, EventArgs e)
{
    if (!IsPostBack)
    {
        AddMetaInfo();
    }
}
public void AddMetaInfo()
{
    // create filestream object
    FileStream fs = new FileStream(Server.MapPath("Example.pdf"), FileMode.Create);

    // create document object
    Document doc = new Document();

    // create PdfWriter instance which will write at file filestream
    PdfWriter.GetInstance(doc, fs);

    // opening the dociment
    doc.Open();

    // creating paragraph object
    Paragraph para = new Paragraph("Add meta information");
    para.Alignment = Element.ALIGN_CENTER;

    // Adding meta info 
    doc.AddTitle("C# iTextSharp");
    doc.AddAuthor("ASPArticles.com");
    doc.AddSubject("Adding meta information of pdf");
    doc.AddKeywords("ASP.Net Articles, iTextSharp, PDF, add Meta Info to PDF");
    doc.AddCreator("iTextSharp dll");

    // adding pargraph to document
    doc.Add(para);

    doc.Close();
}

Once PDF file is generated, Open the file-> go to file menu -> click on Properties then you will see the meta information as shown below.

Adding meta information of PDF file using iTextSharp