Hi, I am Abrar..

Programmer | Researcher

Advancing AI through Generative AI

About Me

I am looking forward to PhD position in Software Engineering, specializing in Artificial Intelligence, Natural Language Processing, and Generative AI. With a strong foundation in both theoretical research and practical implementation, I'm passionate about bridging the gap between theoretical research and practical applications in AI and software engineering.

Current Research Focus

Focusing on agentic frameworks in software engineering, including systematic literature review, newsletter marketing agent development, and Agentic Code Security Auto-Fixer project. Exploring how autonomous AI agents can enhance software development processes and security practices.

Python JavaScript/TypeScript Machine Learning NLP Cloud Computing Databases Software Engineering Data Analysis

Explore Research Contact Me Download CV

GitHub LinkedIn Google Scholar YouTube

Research Interests

Artificial Intelligence

Developing intelligent systems for real-world applications, focusing on machine learning algorithms, deep learning, and neural networks.

Natural Language Processing

Advanced text analysis, sentiment analysis, and language understanding using state-of-the-art NLP techniques and models.

Data Science & Analytics

Extracting insights from complex datasets, statistical analysis, and predictive modeling for healthcare and business applications.

Software Engineering

Building scalable, maintainable software systems with focus on clean architecture, testing, and DevOps practices.

PhD Research Proposals

Reinforcement + Generative Optimization for Automated Software Testing with Generative AI (Optimizer Agent)

Objective

Design an Optimizer Agent that turns LLM-generated test cases into a high-coverage, low-redundancy, cost-aware test suite by closing the loop between generation → execution → learning.

Research Questions

RQ1. How effective is an Optimizer Agent (RL/Bandit/BO) at improving structural and mutation coverage versus prompt-only LLM generation?

RQ2. What reward formulation best balances coverage gain, bug discovery, redundancy reduction, and cost/time?

RQ3. Which optimization strategy (policy-gradient RL, contextual bandits, Bayesian optimization, or GA) gives the best quality-per-cost trade-off across projects of different sizes?

RQ4. How well do improvements generalize across repositories, languages, and test frameworks?

System Overview

5-Stage Loop:
1. Generator (LLM + toolformer style context): Proposes test cases (unit/property/fuzz variants).
2. Executor (sandbox): Runs tests, collects coverage, failures, runtime, flakiness, mutants killed.
3. Selector/Memory: Deduplicates via similarity (AST + embeddings) and keeps a pool.
4. Optimizer Agent: Uses the feedback to choose the next generation action (prompt template, seed test to mutate, temperature, focus file/function, hypothesis strategy, fuzz budget).
5. Convergence: Stops when marginal gains fall below a threshold or budget exhausted.

Reward Design

Let each iteration produce a batch (B) of tests. Compute per-batch signals:

• Coverage gain: ΔC = C_new - C_prev (line/branch/path), normalized to [0,1]
• Mutation gain: ΔM = (mutants killed_new - prev) / total mutants
• Failure discovery: F = unique failing assertions / |B| (de-duped by stack/trace hash)
• Redundancy penalty: R = mean cosine sim(emb(t_i), nearest in pool) ∈ [0,1]
• Cost/time: K = exec time_B / budget ∈ [0,1]

Multi-objective scalarized reward:
R(B) = α·ΔC + β·ΔM + γ·F - λ·R - μ·K
with (α+β+γ=1); tune (λ,μ) by budget.

Optimization Strategies

A. Contextual Bandits (fast, robust): Arm = "generation policy" (prompt + strategy), context = repo/file metrics; update via Thompson Sampling / UCB on R.

B. Policy-Gradient RL (fine control): State = coverage map + deficit hotspots; Action = generation knobs; Reward = R. Use PPO with action masking (invalid knobs pruned).

C. Bayesian Optimization (few but high-value steps): Black-box optimize R over continuous knobs (temperature, fuzz budget) + categorical (prompt template) via mixed-BO (SMAC/GP+TPE).

D. Genetic Search (diverse test pools): Evolve test cases and prompts; crossover = splice assertions & inputs; mutation = boundary value twists, fuzz seeds.

Dedup & Quality Gates

• Similarity filters: AST diff + semantic embedding (CodeBERT/UniXCoder). Drop if sim ≥ τ.
• Flakiness control: re-run failing tests (k) times; label flaky if fail/pass mix occurs.
• Invariant/Oracle quality: prefer property-based or metamorphic assertions when available; static analyzers to flag over-fitting assertions.

Datasets & Targets

• Small/medium: open-source libs with mature tests (e.g., utility/date/math libs).
• Large: 1–3 real-world repos (backend service, CLI tool).
• Languages: start with Python (pytest + Hypothesis) and one JVM repo (JUnit + PIT).

Baselines

• B0: Prompt-only LLM tests (single pass).
• B1: Prompt + simple heuristic selection (keep unique filenames/lines touched).
• B2: Search-based testing (e.g., EvoSuite/PBT without LLM).
• Your methods: Bandit, RL (PPO), BO, GA.

Metrics

Effectiveness:
• Line/branch/path coverage; Δ vs baseline
• Mutation score (killed/total)
• Bugs found (unique failing tests / confirmed issues)
• Redundancy: avg nearest-neighbor similarity; unique lines/functions touched

Efficiency:
• Quality-per-cost: (ΔC/min), (killed mutants/$)
• Wall time & runs to reach 95% of best coverage (sample-efficiency)

Stability/Generalization:
• Flake rate; transfer (train choices on Repo A, apply to Repo B)

Experimental Design

• Phases: (i) Pilot on small repo to tune (α,β,γ,λ,μ). (ii) Full study across 4–6 repos × 3 budgets (time and $) × seeds (n=5).
• Ablations: remove each reward term; freeze generator (no knob changes); swap optimizer.
• Statistics: paired tests with Cliff's delta; bootstrap CIs; report Pareto frontiers.
• Stopping: no improvement of R for 3 iterations OR budget hit.

Expected Contributions

1. Unified reward for coverage, mutation strength, failure discovery, redundancy, and cost.
2. Optimizer Agent design that adapts generation strategy to repo context.
3. Budget-aware evaluation showing quality-per-cost gains over strong baselines.
4. Reproducible toolkit (scripts + configs) to plug into CI.

Risks & Mitigations

• Oracle weakness → add mutation testing & metamorphic checks.
• LLM determinism → fix seeds & log prompts; use temperature schedules.
• Cost blow-ups → cap batch size, early-stop low-reward arms, use BO for expensive knobs.

Reinforcement Learning Bayesian Optimization Genetic Algorithms Mutation Testing Software Testing LLM Integration

Efficient Handling and Retrieval of Document Content Using Large Language Models and Retrieval-Augmented Techniques (Doc Agent)

Objective

Build a page/section-aware RAG system that can (a) answer semantic questions and (b) return precise, citable evidence (page/section-anchored spans) with low latency on large PDFs.

Testable Hypotheses

H1 (Localization): Page/section-aware RAG yields higher Page@1/Page@K than LLM-only and plain chunking.

H2 (Retrieval): Hybrid retrieval (BM25 + dense) outperforms dense-only or BM25-only on mixed semantic + exact queries at similar latency.

H3 (Chunking): Moderate chunk sizes with light overlap (e.g., 600–1,000 tokens, 10–15% overlap) + hierarchical Map-Reduce summaries maximize faithfulness and lower hallucinations.

H4 (Faithfulness): Answer-from-quotes with page/section citations reduces hallucination rate without hurting task usefulness.

System Architecture

End-to-End Pipeline:

Ingestion & Structuring:
1. Parse PDF → logical structure: pages, sections, headings, TOC anchors, figure/table captions; preserve (doc_id, page_no, section_id, heading path).
2. OCR (if needed) and layout recovery (tables, footnotes, columns) → attach bbox for spans.

Indexing:
• Create page-aware chunks with metadata (doc_id, page_no, section_id, heading_path, text, token_len, emb, bm25_terms)
• Build dual indices: BM25/keyword (inverted index) + Dense (ANN: HNSW/FAISS)

Map-Reduce Summarization (offline):
• Map: page-level keypoint summaries + salient spans (quote candidates)
• Reduce-1: merge pages → section summaries with page backlinks
• Reduce-2: merge sections → chapter/whole-doc overview & topic index

Query Flow (online):
1. Intent & constraint detection: semantic vs exact ("what's on page 496?" → force page_no=496)
2. Candidate retrieval (hybrid): top-K from BM25 and dense; merge + dedup by page/section
3. Reranking: cross-encoder + MMR diversification to avoid same-page near-duplicates
4. Evidence extraction: span scoring within top chunks; return quoted spans with page/section
5. Answerer: constrained generation that must (a) cite pages, (b) quote or paraphrase only from retrieved spans

Retrieval & Reranking

Hybrid score for chunk (i):
S_i = η · BM25_i^norm + (1-η) · cos(q, e_i)
with η ∈ [0,1] tuned per query type (↑ for exact keyword/page queries).

MMR (diversify by page/section):
MMR(i) = λ·S_i - (1-λ)max_{j ∈ R} sim(i,j)
where R is the current selected set; similarity uses page/section penalties to avoid duplicates from the same page unless explicitly asked.

Span extraction inside a chunk:
• Score sentences by BM25(term overlap) + dense similarity + title/heading proximity + bbox prominence (if layout is available)
• Keep top-N spans; attach (doc_id, page_no, start-end char/bbox)

Answer Constraints

• Quote-first: synthesize from quoted spans (with citations like "p. 496, §3.2.1")
• Strict grounding: disallow introducing facts not supported by retrieved spans
• Structured output:
  o Answer (2–5 sentences)
  o Evidence (bulleted quotes + page/section)
  o Where to read next (section headings with page ranges)

Experimental Design

Datasets (diverse structure & length):
• Technical standards / statutes (e.g., GDPR, software specs), long textbooks/monographs, multi-chapter reports
• For each doc: create gold annotations mapping ~200–400 questions to correct page(s)/section(s) and answer spans

Query sets:
• Semantic: "Where is lawful basis 'legitimate interest' discussed?"
• Exact/structural: "What is on page 496?" "Summarize §5.4.2."
• Hybrid: "Page with the final algorithm pseudocode for X?"

Baselines:
• B0: LLM-only (no retrieval)
• B1: Plain chunking + dense-only
• B2: Plain chunking + BM25-only
• B3: Hybrid retrieval but no hierarchical summaries, no reranker
• Ours: Full Doc Agent (chunking+metadata, hybrid, rerank, hierarchical Map-Reduce, quote-constrained answering)

Metrics

Localization & Faithfulness:
• Page@1/Page@3/Page@5 (gold page covered in top-K evidence)
• Section@K (section-level localization)
• Span F1 (answer span extraction vs gold)
• Citation precision/recall (are page/section cites correct?)
• Hallucination rate (unsupported claims per answer)

Answer Quality & Utility:
• Exact match / ROUGE-L vs gold answers (when available)
• Human utility rating (1–5) on correctness + clarity

Efficiency & Scalability:
• Latency (p50/p95) per query
• Cost (tokens + embedding + reranker)
• Indexing throughput (pages/min) and index size
• Cold-start speed (first-page-ready time)

Implementation Plan

Phase 1 — MVP (1–2 weeks):
• Parser + page-aware chunker (+ optional OCR)
• Dual indices (BM25 + dense) and hybrid scorer (S_i)
• Simple reranker (cross-encoder) + MMR
• Quote-constrained answer template returning (answer + evidence with page numbers)

Phase 2 — Hierarchical & Span Extraction (1–2 weeks):
• Map-Reduce summaries: page → section → chapter
• Sentence/paragraph-level span scorer (hybrid signals)
• Add "page-forced" mode for structural queries

Phase 3 — Evaluation Rig (1–2 weeks):
• Gold QA sets with page/section ground truth
• Batch runner, metrics, and ablation toggles
• Latency and cost instrumentation; caching

Efficiency Tactics

• ANN (HNSW/FAISS) for dense; prune to Kd, union with Kb from BM25, rerank top Kr (e.g., 40→16→8)
• Cache per-doc: embeddings, summaries, and top-K retrievals for recent queries
• Early exit for exact "page = N" queries (skip dense; only filter by page_no)
• Streaming answers; quote packing to minimize context tokens

Risks & Mitigations

• PDF layout noise / OCR errors → fallback to page-level retrieval; add fuzzy term matching; keep bbox to avoid column bleed
• Over-chunking → increases latency; use adaptive chunking by layout (headings, long paragraphs)
• Dense drift across domains → try domain-tuned embeddings; mix with BM25 and increase η for keyword-heavy queries
• Reranker cost → light cross-encoder; cap rerank set; quantize

Expected Contributions

1. A page/section-aware RAG pipeline that improves correct-page localization and faithful citations
2. A hybrid retrieval + reranking recipe tailored to mixed semantic & structural queries
3. A quote-first, cite-always answerer that demonstrably reduces hallucinations
4. A reproducible benchmark (docs, queries, metrics) for page-level QA over long PDFs

Reporting & Figures

• Localization curves: Page@K vs K for all methods
• Latency-accuracy fronts: Page@1 vs p95 latency (Pareto)
• Chunking ablation heatmap: Page@1 across (size × overlap)
• Faithfulness bar chart: Hallucination rate & Citation precision by method
• Retrieval mix: Contribution of BM25 vs dense in top-K across query types
• Case studies: Side-by-side answers with quotes and page thumbnails

RAG Systems Document Processing Citation Generation Hybrid Retrieval Map-Reduce Hallucination Prevention

Projects

View All Repositories →

AI‑Powered Sentiment Analysis

Advanced NLP application combining React frontend with Flask backend for real-time sentiment analysis. Features include text preprocessing, feature extraction, and machine learning model integration.

React Flask NLP ML

Biological Aging & Frailty Prediction

Machine learning pipeline for predicting biological aging and frailty risk using healthcare data. Implements advanced feature engineering and ensemble methods for improved accuracy.

Python Scikit-learn EDA Healthcare

AI Contract Analyzer

Intelligent contract analysis system using NLP techniques for automatic clause extraction, risk assessment, and compliance checking. Features advanced text processing and pattern recognition.

NLP FastAPI Regex Legal Tech

ECG Data Analysis for CAD Detection

Signal processing and machine learning approach for detecting coronary artery disease from ECG data. Implements advanced signal analysis techniques and classification algorithms.

Python Pandas Signal Processing Cardiology

Live Weather Analytics Platform

Real-time weather data visualization and historical analysis platform. Features interactive charts, data forecasting, and API integration for comprehensive weather insights.

JavaScript API Integration Data Visualization Web App

Advanced Software Testing Framework

Comprehensive testing framework with automated test generation, coverage analysis, and CI/CD integration. Implements best practices for software quality assurance.

PyTest CI/CD Coverage Quality Assurance

Professional Experience

Research Assistant · GPT-Lab (Tampere University)

Jun 2025 – Present · Tampere, Finland

Teaching Assistant · Tampere University

Jan 2025 – Mar 2025 · Tampere, Finland

Project Manager (MIS) · Gazi Group

Apr 2021 – Aug 2023 · Bangladesh

Programmer (MIS) · Gazi Networks Ltd.

Sep 2018 – Mar 2021 · Bangladesh

Academic Background

M.Sc. Computing Science (Software, Web & Cloud)

Aug 2023 – Present · Tampere University, Finland

B.Sc. Computer Science & Engineering

Nov 2013 – Dec 2017 · Ahsanullah University of Science and Technology, Bangladesh

Key Achievements

Academic Excellence

100% tuition fee scholarship at Tampere University · 150+ LeetCode problems solved · Published research papers in top-tier journals

VS Code PyCharm IntelliJ MATLAB RStudio Docker

Research Publications

Innovative trend analysis technique with fuzzy logic and K-means clustering approach for identification of homogenous rainfall region: A long-term rainfall data analysis over Bangladesh

Advanced data analysis technique for homogeneous rainfall regions using fuzzy logic and K-means clustering algorithms. Published in Springer's Multimedia Tools and Applications journal.

View Abstract

Understanding regional climatic trends is crucial for taking appropriate actions to mitigate the impacts of climate change and managing water resources effectively. This study aims to investigate the dissimilarities and similarities among various climate stations in Bangladesh from 1981 to 2021. Fuzzy C-means (FCM) and K-means clustering techniques were employed to identify regions with comparable rainfall patterns. Moreover, the innovative trend analysis (ITA) and the Mann-Kendall (MK) test family were utilized to analyze rainfall trends. The results indicate that both K-means and FCM methods successfully detected two rainfall regions in Bangladesh with distinct patterns. The ITA curve analysis revealed that out of the 29 stations, 13 had a non-monotonic increasing trend having no monotonic increasing trend, 8 had a non-monotonic decreasing trend, and 8 exhibited a monotonic decreasing trend. Additionally, the MK tests employed in the study showed predominantly negative trends across Bangladesh. The majority of stations (65.51%) fell into Cluster 1, while the remaining 34.48% were in Cluster 2. In terms of ITA analysis, 17.24% of stations exhibited a monotonic decrease, while there were no stations with a monotonic increase. However, 37.93% of stations showed a non-monotonic increase, and 44.83% displayed a non-monotonic decrease. These identified regions can provide valuable insights for water resource management, disaster risk reduction, and agricultural planning. Moreover, detailed rainfall analysis can help policymakers and scientists develop sustainable and effective regional-scale policies for managing the country's flood and drought situations, ultimately supporting agricultural development and environmental planning.

Data Analysis Machine Learning Climate Science Springer 2025

Ultrasound-based AI for COVID-19 detection: a comprehensive review of public and private lung ultrasound datasets and studies

Comprehensive review of public and private lung ultrasound datasets for COVID-19 detection using artificial intelligence. Published in Elsevier's Informatics in Medicine Unlocked journal.

View Abstract

The COVID-19 pandemic has affected millions of people globally, with respiratory organs being strongly affected in individuals with comorbidities. Medical imaging-based diagnosis and prognosis have become increasingly popular in clinical settings for detecting COVID-19 lung infections. Among various medical imaging modalities, ultrasound stands out as a low-cost, mobile, and radiation-safe imaging technology. In this comprehensive review, we focus on AI-driven studies utilizing lung ultrasound (LUS) for COVID-19 detection and analysis. We provide a detailed overview of both publicly available and private LUS datasets and categorize the AI studies according to the dataset they used. Additionally, we systematically analyzed and tabulated the studies across various dimensions, including data preprocessing methods, AI models, cross-validation techniques, and evaluation metrics. In total, we reviewed 60 articles, 41 of which utilized public datasets, while the remaining employed private data. Our findings suggest that ultrasound-based AI studies for COVID-19 detection have great potential for clinical use, especially for children and pregnant women. Our review also provides a useful summary for future researchers and clinicians who may be interested in the field.

Medical AI COVID-19 Dataset Review Elsevier 2024

Get In Touch

Interested in collaborating on research or discussing potential PhD opportunities?
I'm always excited to connect with fellow researchers and industry professionals.

abrar.morshed@tuni.fi +358 46 537 6155 LinkedIn

Open to Collaboration

Research assistantships, PhD supervision opportunities, industry partnerships, and academic collaborations in AI, NLP, and software engineering.