A novel medical large language model family with 13/70B parameters, which have SOTA performances on various medical tasks
-
Updated
Jan 15, 2025 - Python
A novel medical large language model family with 13/70B parameters, which have SOTA performances on various medical tasks
Cross-type Biomedical Named Entity Recognition with Deep Multi-task Learning (Bioinformatics'19)
Bioformer: an efficient BERT model for biomedical text mining
[EMNLP 2024] This is the code for our paper "BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers".
A PMC ID in. Clean, loss-aware article JSON out. Parse PubMed Central and JATS XML for biomedical AI, RAG, search, and literature pipelines.
This repository contains the code used for distillation and fine-tuning of compact biomedical transformers that have been introduced in the paper "On The Effectiveness of Compact Biomedical Transformers"
Systematic evaluation of hallucination risks in Large Language Models (GPT-4, Claude 3, Gemini Pro) for clinical proteomics and mass spectrometry interpretation. Production-ready detection framework with comprehensive benchmarks.
Graph-based RAG system for biomedical nutrigenetic knowledge discovery. Enables natural language queries on gene-nutrient interactions, supports personalized nutrition counseling, and runs 100% locally with Ollama LLMs and SBERT embeddings.
Multi-agent AI system for automated biomedical claim verification using LangGraph, open-source LLMs, and hybrid RAG over extended scientific literature (PMC/Europe PMC). Built as a Big Data Engineering project @ UNINA.
BERT-for-BioNLP-OST2019-AGAC-Task2
Grounded biomedical RAG agent with a frontier-vs-open evaluation harness.
RAG pipeline for medical question-answering. Fuses lexical and dense retrieval (MedCPT, Contriever, Specter + FAISS) with OpenAI, Gemini, and HuggingFace LLMs. Supports iterative multi-round reasoning, strict typing, structured observability, and a clean layered architecture
AGAC-BioNL-OST2009-Task1 BERT+CRF
Implements relation extraction for biomedical texts using Hard Negative Mining to improve accuracy in identifying complex entity relationships. Includes code for data processing, training, and evaluation with BioC-format datasets.
Cancer-Alterome is a comprehensive and curated dataset that focuses on the investigation of regulatory events caused by gene alteration in the context of cancer.
SOEA-Plus (PDEMC): 3-task biomedical metacognition benchmark evaluating LLM metacognitive control across 2 frontier models on 300 real PubMed examples. Reveals the Control Collapse Gap
Clinical trial document intelligence pipelines using medallion architecture. Classification (87 categories) + NER (8 entity types) on Databricks.
Add a description, image, and links to the biomedical-nlp topic page so that developers can more easily learn about it.
To associate your repository with the biomedical-nlp topic, visit your repo's landing page and select "manage topics."