A research-grade scientific document summarization system built for rigorous evaluation and deployment.
This repository delivers a complete long-form research paper summarizer that converts academic PDFs into structured, evidence-aware summaries with section-level breakdowns, citation-aware media alignment, and fact consistency auditing.
Key outcomes:
- Structured extraction of paper metadata, sections, citations, figures, and tables
- Section-aware and graph-informed summarization to preserve document logic
- Factual consistency checks and summary revision to reduce hallucinations
- Media segmentation evaluation for figures and tables
- Proven experimental workflow and publication-ready metric outputs
The system is organized as a modular pipeline:
- Document extraction
- GROBID-style section parsing and metadata extraction
- PDF text and media extraction with
PyMuPDF - Section graph construction supporting contextual summarization
- LLM-driven summarization
- Section-level summarization using LLaMA through either a local GGUF model or Ollama API
- Final summary composition from priority-ranked sections
- Domain-specific adaptation for legal, medical, government, and general documents
- Factual auditing and revision
- Support scoring between summary sentences and source sentences
- Contradiction detection based on negation and numeric alignment
- Audit-driven summary revision to remove unsupported claims
- Multi-document literature synthesis
- Cross-paper highlight extraction
- Combined trends, common findings and differences
- Media metrics
- Figure/table assignment coverage and alignment
- Caption and preview quality assessment
The evaluation framework produces quantitative metrics for comparison between a baseline summarization pipeline and the proposed structure-aware approach.
Representative results from a single long-document experiment on an arXiv paper sample:
- ROUGE-1 F1: 0.1277 → 0.1346
- ROUGE-2 F1: 0.0483 → 0.0957
- ROUGE-L F1: 0.0747 → 0.0832
- Semantic proxy score: 0.4532 → 0.5650
- Factual consistency score: 0.3235 → 0.5022
- Section coverage: 0.60 → 0.80
- Structure coherence signal: 0.00 → 0.1663
These metrics demonstrate improved summary relevance, structure preservation, and evidence alignment when using section-aware selection and graph-context summarization.
This project includes research-ready components for evaluating summarization quality and media-aware document understanding:
research_paper_novelty_experiments.ipynbfor reproducible experimentationresearch_experiment_framework.pywith evaluation, auditing, and multi-document summarization logicrun_research_experiments.pyfor end-to-end experiment execution and metric output generationoutputs/tables/containing publication-ready CSV and LaTeX tables for metrics and ablation analysis
- Python 3.12
- Streamlit for user-facing dashboard and interactive document exploration
PyMuPDFfor PDF parsing and figure cropping- LLaMA model integration via Ollama API or local GGUF runtime
- Custom summarization and evaluation pipeline in Python
This repository is built around academic PDF summarization for long documents. Included sample content and experiment inputs include:
data/2004.05150v2.pdfas a representative arXiv research paperresearch_paper_novelty_experiments.ipynbfor evaluation workflowsresearch_experiment_results.jsonrecording experiment outputs and metric comparisons
Primary deliverables in this repository:
Structured_Summary.txtandstructured_summary_output.txtresearch_experiment_results.jsonwith baseline and structure-aware metricsoutputs/tables/for publication-ready results and ablation tablesapp.pyStreamlit interface for interactive paper summarization
This project demonstrates an end-to-end, deployment-ready pipeline that bridges academic PDF parsing with modern LLM summarization while emphasizing structure, factual rigor, and research evaluation. It is intended for technical reviewers and recruiters who want to see a concrete engineering and research outcome rather than only run instructions.