Paper Search

Search Papers

Found 23 papers

LLM-VLM Fusion Framework for Autonomous Maritime Port Inspection using a Heterogeneous UAV-USV System

Muhayy Ud Din, Waseem Akram, Ahsan B. Bakht, Irfan Hussain

January 21, 2026

cs.RO arXiv: 2601.13096

Maritime port inspection plays a critical role in ensuring safety, regulatory compliance, and operational efficiency in complex maritime environments. However, existing

llm-vlm fusion autonomous maritime port inspection heterogeneous uav-usv system symbolic mission planning semantic inspection dependency graphs natural language instruction context-aware monitoring real-time perception mbzirc maritime simulator lightweight on-board design gpt-4 gpt-3.5-turbo gemini llama smolvlm florence-2 qwen2-vl moondream2 git-base a* path planning pid control regulatory compliance assessment cooperative robotic platforms ai model benchmarking

cs.CV cs.RO
AgenticRed: Optimizing Agentic Systems for Automated Red-teaming

Jiayi Yuan, Jonathan Nöther, Natasha Jaques, Goran Radanović

January 21, 2026

cs.AI arXiv: 2601.13518

While recent automated red-teaming methods show promise for systematically exposing model vulnerabilities, most existing

agenticred automated red-teaming agentic systems evolutionary algorithms in-context learning system design ai safety jailbreaking attack success rate transferability meta agent search evolutionary selection self-reflection reward shaping refusal suppression crossover mutation adversarial reasoning autodan-turbo harmbench strongreject llama-2-7b llama-3-8b gpt-3.5-turbo judge function

cs.NE cs.AI
ComGPT: Detecting Local Community Structure with Large Language Models

Wenjian Luo, Yiwen Zhang, Li Ni, Haowen Shen, Lin Mu

January 22, 2026

cs.SI arXiv: 2408.06658

Large Language Models (LLMs), like GPT-3.5-turbo, have demonstrated the ability to understand graph structures and have achieved

local community detection large language models seed expansion algorithm comgpt gpt-3.5-turbo graph reasoning graph encoding comincident node selection guide seed-dependent problem community diffusion free rider effect local modularity graph structure prompt engineering potential node identification node supplementation community knowledge subgraph jaccard index fscore network topology domain knowledge token constraints heuristic rules

cs.SI
Multi-Persona Thinking for Bias Mitigation in Large Language Models

Yuxing Chen, Guoqing Luo, Zijun Wu, Lili Mou

January 23, 2026

cs.CL arXiv: 2601.15488

Large Language Models (LLMs) exhibit significant social biases that can perpetuate harmful stereotypes and unfair outcomes. In this paper

large language models bias mitigation multi-persona thinking dialectical reasoning social bias inference-time framework stereotypes persona assignment self-debiasing prompting-based strategies bbq benchmark stereoset fairness multi-agent debate neutral persona iterative reasoning social identities cognitive debiasing perspective-taking accuracy diff-bias score llama-3.1 gpt-3.5-turbo natural language processing role-playing prompts

cs.CL cs.AI
Discourse Features Enhance Detection of Document-Level Machine-Generated Content

Yupei Li, Manuel Milling, Lucia Specia, Björn W. Schuller

January 27, 2026

cs.CL arXiv: 2412.12679

The availability of high-quality APIs for Large Language Models (LLMs) has facilitated the widespread creation of Machine

machine-generated content mgc detection discourse analysis dtransformer penn discourse treebank pdtb document-level detection paraphrased content large language models structural features paralfqa parawp plagbench m4 dataset dipper transformer hierarchical models cross-attention mechanism natural language understanding academic plagiarism misinformation semantic features rhetorical structure theory gpt-3.5-turbo roberta

cs.CL
A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic

Juan Moreno Gonzalez, Bashar Alhafni, Nizar Habash

January 30, 2026

cs.CL arXiv: 2507.04746

Judeo-Arabic refers to Arabic variants historically spoken by Jewish communities across the Arab world, primarily

judeo-arabic transliteration hebrew script arabic script post-correction grammatical error correction large language models llms gpt-3.5-turbo gpt-4o charmapper sequence-to-sequence sweet morphosyntactic tagging machine translation code-switching diacritics upper dot diacritic al-khazari sefaria benchmark evaluation zero-shot downstream tasks arabic nlp orthographic errors

cs.CL
Leveraging LLMs for Translating and Classifying Mental Health Data

Konstantinos Skianis, A. Seza Doğruöz, John Pavlopoulos

February 03, 2026

cs.CL arXiv: 2410.12985

Large language models (LLMs) are increasingly used in medical fields. In mental health support, the early identification of linguistic markers

large language models mental health detection depression severity classification multilingual nlp cross-lingual transfer machine translation gpt-3.5-turbo zero-shot learning text classification user-generated content reddit dataset depseverity dataset greek language processing low-resource languages prompt-based classification precision recall f1-score imbalanced dataset mental health linguistics markers social media analysis automatic translation evaluation cross-lingual evaluation error analysis clinical decision support mental health ai ethics human supervision in healthcare ai

cs.CL
Smarter AI Through Prompt Engineering: Insights and Case Studies from Data Science Application

Snehasish Paul, Rohit Kumar, Laxman Das

February 03, 2026

cs.DL arXiv: 2602.00337

The field of prompt engineering is becoming an essential phenomenon in artificial intelligence. It is altering how data scientists interact with large language models (LLMs

prompt engineering large language models gpt-4 gpt-3.5-turbo claude chain-of-thought prompting zero-shot learning few-shot learning retrieval-augmented generation automatic prompt optimization gradient-based optimization agent-based optimization multi-objective optimization nsga-ii bayesian optimization schema matching clinical named entity recognition phishing detection data preprocessing financial question answering finqa dataset convfinqa dataset materials property extraction business intelligence fine-tuning

cs.DL
Evaluating Prompt Engineering Strategies for Sentiment Control in AI-Generated Texts

Sophie Jentzsch, Kerstin Sahler

February 09, 2026

cs.CL arXiv: 2602.06692

The groundbreaking capabilities of Large Language Models (LLMs) offer new opportunities for enhancing human-computer interaction

large language models gpt-3.5-turbo prompt engineering emotion steering sentiment control emotion-adaptive ai few-shot prompting zero-shot prompting chain-of-thought prompting fine-tuning controlled text generation ekman's six basic emotions emotion score weighted f1 score distil-roberta emotion classifier bertscore lexical diversity distinct-n metric flesch reading ease score human-written examples in-context learning text style transfer goemotions dataset meld dataset nrc emotion lexicon

cs.CL
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks

Andreas Happe, Aaron Kaplan, Juergen Cito

February 12, 2026

cs.CR arXiv: 2310.11409

Penetration-testing is crucial for identifying system vulnerabilities, with privilege-escalation being a critical subtask to gain elevated access to

large language models autonomous penetration testing linux privilege escalation cybersecurity offensive security llm agents gpt-4-turbo gpt-3.5-turbo llama3 hackingbuddygpt privilege escalation benchmark virtual machine testbed context management strategies state summarization reflection pattern in-context learning enumeration tools high-level guidance cron-based vulnerabilities suid misconfiguration sudo exploitation information disclosure cost analysis human baseline comparison mitre attack framework

cs.AI cs.CR
Fostering Collective Discourse: A Distributed Role-Based Approach to Online News Commenting

Yoojin Hong, Yersultan Doszhan, Joseph Seering

February 13, 2026

cs.HC arXiv: 2510.02766

Current news commenting systems are designed based on implicitly individualistic assumptions, where discussion is the result of a series of

news commenting systems distributed roles collective discourse online news collaborative discussion public sphere moderation clustering summarization threading human-computer interaction social computing user engagement perspective diversity argument strength emotional expression politeness mixed-methods evaluation browser extension crowd-based moderation deliberative democracy gpt-3.5-turbo participatory journalism information architecture structured workflows

cs.HC
ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

Yang Liu, Shaofeng Yin, Ting Lei

March 05, 2026

cs.AI arXiv: 2508.03284

Integrating external tools into Large Foundation Models (LFMs) has emerged as a promising approach to enhance their problem-solving

toolvqa visual question answering multi-step reasoning external tools large foundation models toolengine data generation pipeline depth-first search in-context example matching longest common subsequence multimodal dataset llava-7b tool-augmented vqa out-of-distribution benchmarks instruction fine-tuning tool-use agent vision-language models implicit reasoning real-world visual contexts argument prediction answer summarization tool selection gpt-3.5-turbo image-guided dfs multimodal tool-use

cs.AI
Large Language Models are Contrastive Reasoners

Liang Yao

March 06, 2026

cs.CL arXiv: 2403.08211

Prompting methods play a crucial role in enhancing the capabilities of pre-trained large language models (LLMs). We explore how contrastive

large language models contrastive prompting complex reasoning zero-shot prompting chain-of-thought arithmetic reasoning commonsense reasoning symbolic reasoning gpt-4 gpt-3.5-turbo gsm8k aqua-rat two-stage prompting self-awareness error avoidance prompt engineering self-consistency few-shot prompting strategyqa svamp multiarith contrastive reasoners instruction tuning reinforcement learning from human feedback trigger sentence

cs.CL cs.AI
Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information

Yoshiki Tanaka, Takumasa Kaneko, Hiroki Onozeki, Natsumi Ezure, Ryuichi Uehara, Zhiyang Qi, Tomoya Higuchi, Ryutaro Asahara, Michimasa Inaba

March 10, 2026

cs.CL arXiv: 2603.07111

The Werewolf Game is a communication game where players' reasoning and discussion skills are essential. In this study, we present a Werewolf AI

werewolf game large language models dialogue summarization persona information aiwolfdial 2024 multi-agent systems conversational ai reasoning capabilities contextual consistency chain-of-thought prompting incomplete information game natural language generation agent architecture prompt engineering game theory in-context learning speech strategies characterization gpt-4 gpt-3.5-turbo social deduction games dialogue history compression role-playing decision making human-computer interaction

cs.CL cs.AI
Not All Queries Need Deep Thought: CoFiCot for Adaptive Coarse-to-fine Stateful Refinement

Dongxu Zhang, Hongqiang Lin, Yiding Sun, Pengyu Wang, Qirui Wang, Ning Yang, Jihua Zhu

March 10, 2026

cs.CL arXiv: 2603.08251

Scaling test-time computation enhances LLM reasoning ability but faces a uniform computation paradox. Allocating identical resources leads to over-correction on simple

large language models test-time computation chain of thought adaptive computation coarse-to-fine refinement stateful sequential correction process reward models outcome reward models error localization semantic entropy consensus reliability reasoning depth iterative refinement uniform computation paradox self-correction mathematical reasoning commonsense reasoning gsm8k math dataset mmlu llama-3 gpt-3.5-turbo multi-metric classifier inference strategies token efficiency

cs.CL
Reactive Writers: How Co-Writing with AI Changes How We Engage with Ideas

Mor Naaman, Marianne Aubin Le Quéré, Maurice Jakesch, Advait Bhat

March 12, 2026

cs.HC arXiv: 2603.10374

Emerging experimental evidence shows that writing with AI assistance can change both the views people express in writing and the opinions they hold afterwards. Yet, we lack

human-ai co-writing ai writing assistants large language models gpt-3 gpt-3.5-turbo gpt-4o mixed-methods study retrospective verbal protocol interviews interaction log analysis topic modeling bertopic hierarchical agglomerative clustering t-sne visualization algorithmic agenda-setting ai persuasion opinion change cognitive offloading automation bias default effect processing fluency reactive writing inline autocomplete human-centered computing social media discourse human-ai interaction

cs.HC cs.AI
SlovKE: A Large-Scale Dataset and LLM Evaluation for Slovak Keyphrase Extraction

Marek Šuppa, David Števaňák

March 17, 2026

cs.CL arXiv: 2603.15523

Keyphrase extraction for morphologically rich, low-resource languages remains understudied, largely due to the scarcity of suitable evaluation

keyphrase extraction slovak language morphologically rich languages low-resource languages slavic nlp slovke benchmark dataset scientific abstracts large language models llm evaluation keyllm gpt-3.5-turbo yake textrank keybert slovakbert morphological mismatch exact matching partial matching f1 score lemmatization unsupervised learning canonical forms surface forms manual evaluation

cs.CL cs.AI
Biased AI can Influence Political Decision-Making

Jillian Fisher, Shangbin Feng, Robert Aron, Thomas Richardson, Yejin Choi, Daniel W. Fisher, Jennifer Pan, Yulia Tsvetkov, Katharina Reinecke

March 20, 2026

cs.HC arXiv: 2410.06415

As modern large language models (LLMs) become integral to everyday tasks, concerns about their inherent biases and their potential impact on human decision-making have

large language models llms partisan bias political decision-making behavioral bias human-ai interaction opinion formation gpt-3.5-turbo prompting prefix identifiers interactive experiments ordinal logistic regression anova framing dimensions persuasion techniques ai education bias mitigation political partisanship budget allocation task topic opinion task political compass test computational social science human cognition public discourse social bias

cs.HC cs.AI
Enhancing Team Diversity with Generative AI: A Novel Project Management Framework

Yuming Li, Johnny Chan

April 02, 2026

cs.CY arXiv: 2502.05181

This research-in-progress paper presents a new project management framework that utilises GenAI technology. The framework is designed to address the common challenge of uniform team

generative ai project management team diversity genai agents personality traits big five personality model team composition fine-tuning chatgpt gpt-3.5-turbo friendspersona dataset essay dataset sociological patterns team roles academic research projects homogeneous teams transformer models self-attention mechanism natural language processing multi-label annotation binary classification f1 score human-ai collaboration entrepreneurial success synthetic datasets

cs.LG cs.AI cs.CY
Detecting Reference Errors in Scientific Literature with Large Language Models

Tianmai M. Zhang, Neil F. Abernethy

April 03, 2026

cs.CL arXiv: 2411.06101

Reference errors, such as citation and quotation errors, are common in scientific papers. Such errors can result in the propagation of inaccurate information, but are difficult and

reference errors quotation errors citation errors large language models gpt gpt-3.5-turbo gpt-4-turbo gpt-4o retrieval-augmented generation fine-tuning expert-annotated dataset statement-reference pairs scientific literature scientific publishing peer review fact-checking academic misconduct scientific claim verification grobid llamaindex hallucination error analysis label accuracy fully substantiated unsubstantiated

cs.CL

Paper Search

Found 23 papers

LLM-VLM Fusion Framework for Autonomous Maritime Port Inspection using a Heterogeneous UAV-USV System

AgenticRed: Optimizing Agentic Systems for Automated Red-teaming

ComGPT: Detecting Local Community Structure with Large Language Models

Multi-Persona Thinking for Bias Mitigation in Large Language Models

Discourse Features Enhance Detection of Document-Level Machine-Generated Content

A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic

Leveraging LLMs for Translating and Classifying Mental Health Data

Smarter AI Through Prompt Engineering: Insights and Case Studies from Data Science Application

Evaluating Prompt Engineering Strategies for Sentiment Control in AI-Generated Texts

LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks

Fostering Collective Discourse: A Distributed Role-Based Approach to Online News Commenting

ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

Large Language Models are Contrastive Reasoners

Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information

Not All Queries Need Deep Thought: CoFiCot for Adaptive Coarse-to-fine Stateful Refinement

Reactive Writers: How Co-Writing with AI Changes How We Engage with Ideas

SlovKE: A Large-Scale Dataset and LLM Evaluation for Slovak Keyphrase Extraction

Biased AI can Influence Political Decision-Making

Enhancing Team Diversity with Generative AI: A Novel Project Management Framework

Detecting Reference Errors in Scientific Literature with Large Language Models