Paper Search
Found 23 papers
-
LLM-VLM Fusion Framework for Autonomous Maritime Port Inspection using a Heterogeneous UAV-USV System
Muhayy Ud Din, Waseem Akram, Ahsan B. Bakht, Irfan Hussain
January 21, 2026
cs.RO arXiv: 2601.13096Maritime port inspection plays a critical role in ensuring safety, regulatory compliance, and operational efficiency in complex maritime environments. However, existing
llm-vlm fusion autonomous maritime port inspection heterogeneous uav-usv system symbolic mission planning semantic inspection dependency graphs natural language instruction context-aware monitoring real-time perception mbzirc maritime simulator lightweight on-board design gpt-4 gpt-3.5-turbo gemini llama smolvlm florence-2 qwen2-vl moondream2 git-base a* path planning pid control regulatory compliance assessment cooperative robotic platforms ai model benchmarkingcs.CV cs.RO -
AgenticRed: Optimizing Agentic Systems for Automated Red-teaming
Jiayi Yuan, Jonathan Nöther, Natasha Jaques, Goran Radanović
January 21, 2026
cs.AI arXiv: 2601.13518While recent automated red-teaming methods show promise for systematically exposing model vulnerabilities, most existing
agenticred automated red-teaming agentic systems evolutionary algorithms in-context learning system design ai safety jailbreaking attack success rate transferability meta agent search evolutionary selection self-reflection reward shaping refusal suppression crossover mutation adversarial reasoning autodan-turbo harmbench strongreject llama-2-7b llama-3-8b gpt-3.5-turbo judge functioncs.NE cs.AI -
ComGPT: Detecting Local Community Structure with Large Language Models
Wenjian Luo, Yiwen Zhang, Li Ni, Haowen Shen, Lin Mu
January 22, 2026
cs.SI arXiv: 2408.06658Large Language Models (LLMs), like GPT-3.5-turbo, have demonstrated the ability to understand graph structures and have achieved
local community detection large language models seed expansion algorithm comgpt gpt-3.5-turbo graph reasoning graph encoding comincident node selection guide seed-dependent problem community diffusion free rider effect local modularity graph structure prompt engineering potential node identification node supplementation community knowledge subgraph jaccard index fscore network topology domain knowledge token constraints heuristic rulescs.SI -
Multi-Persona Thinking for Bias Mitigation in Large Language Models
Yuxing Chen, Guoqing Luo, Zijun Wu, Lili Mou
January 23, 2026
cs.CL arXiv: 2601.15488Large Language Models (LLMs) exhibit significant social biases that can perpetuate harmful stereotypes and unfair outcomes. In this paper
large language models bias mitigation multi-persona thinking dialectical reasoning social bias inference-time framework stereotypes persona assignment self-debiasing prompting-based strategies bbq benchmark stereoset fairness multi-agent debate neutral persona iterative reasoning social identities cognitive debiasing perspective-taking accuracy diff-bias score llama-3.1 gpt-3.5-turbo natural language processing role-playing promptscs.CL cs.AI -
Discourse Features Enhance Detection of Document-Level Machine-Generated Content
Yupei Li, Manuel Milling, Lucia Specia, Björn W. Schuller
January 27, 2026
cs.CL arXiv: 2412.12679The availability of high-quality APIs for Large Language Models (LLMs) has facilitated the widespread creation of Machine
machine-generated content mgc detection discourse analysis dtransformer penn discourse treebank pdtb document-level detection paraphrased content large language models structural features paralfqa parawp plagbench m4 dataset dipper transformer hierarchical models cross-attention mechanism natural language understanding academic plagiarism misinformation semantic features rhetorical structure theory gpt-3.5-turbo robertacs.CL -
A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic
Juan Moreno Gonzalez, Bashar Alhafni, Nizar Habash
January 30, 2026
cs.CL arXiv: 2507.04746Judeo-Arabic refers to Arabic variants historically spoken by Jewish communities across the Arab world, primarily
judeo-arabic transliteration hebrew script arabic script post-correction grammatical error correction large language models llms gpt-3.5-turbo gpt-4o charmapper sequence-to-sequence sweet morphosyntactic tagging machine translation code-switching diacritics upper dot diacritic al-khazari sefaria benchmark evaluation zero-shot downstream tasks arabic nlp orthographic errorscs.CL -
Leveraging LLMs for Translating and Classifying Mental Health Data
Konstantinos Skianis, A. Seza Doğruöz, John Pavlopoulos
February 03, 2026
cs.CL arXiv: 2410.12985Large language models (LLMs) are increasingly used in medical fields. In mental health support, the early identification of linguistic markers
large language models mental health detection depression severity classification multilingual nlp cross-lingual transfer machine translation gpt-3.5-turbo zero-shot learning text classification user-generated content reddit dataset depseverity dataset greek language processing low-resource languages prompt-based classification precision recall f1-score imbalanced dataset mental health linguistics markers social media analysis automatic translation evaluation cross-lingual evaluation error analysis clinical decision support mental health ai ethics human supervision in healthcare aics.CL -
Smarter AI Through Prompt Engineering: Insights and Case Studies from Data Science Application
Snehasish Paul, Rohit Kumar, Laxman Das
February 03, 2026
cs.DL arXiv: 2602.00337The field of prompt engineering is becoming an essential phenomenon in artificial intelligence. It is altering how data scientists interact with large language models (LLMs
prompt engineering large language models gpt-4 gpt-3.5-turbo claude chain-of-thought prompting zero-shot learning few-shot learning retrieval-augmented generation automatic prompt optimization gradient-based optimization agent-based optimization multi-objective optimization nsga-ii bayesian optimization schema matching clinical named entity recognition phishing detection data preprocessing financial question answering finqa dataset convfinqa dataset materials property extraction business intelligence fine-tuningcs.DL -
Evaluating Prompt Engineering Strategies for Sentiment Control in AI-Generated Texts
Sophie Jentzsch, Kerstin Sahler
February 09, 2026
cs.CL arXiv: 2602.06692The groundbreaking capabilities of Large Language Models (LLMs) offer new opportunities for enhancing human-computer interaction
large language models gpt-3.5-turbo prompt engineering emotion steering sentiment control emotion-adaptive ai few-shot prompting zero-shot prompting chain-of-thought prompting fine-tuning controlled text generation ekman's six basic emotions emotion score weighted f1 score distil-roberta emotion classifier bertscore lexical diversity distinct-n metric flesch reading ease score human-written examples in-context learning text style transfer goemotions dataset meld dataset nrc emotion lexiconcs.CL -
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks
Andreas Happe, Aaron Kaplan, Juergen Cito
February 12, 2026
cs.CR arXiv: 2310.11409Penetration-testing is crucial for identifying system vulnerabilities, with privilege-escalation being a critical subtask to gain elevated access to
large language models autonomous penetration testing linux privilege escalation cybersecurity offensive security llm agents gpt-4-turbo gpt-3.5-turbo llama3 hackingbuddygpt privilege escalation benchmark virtual machine testbed context management strategies state summarization reflection pattern in-context learning enumeration tools high-level guidance cron-based vulnerabilities suid misconfiguration sudo exploitation information disclosure cost analysis human baseline comparison mitre attack frameworkcs.AI cs.CR -
Fostering Collective Discourse: A Distributed Role-Based Approach to Online News Commenting
Yoojin Hong, Yersultan Doszhan, Joseph Seering
February 13, 2026
cs.HC arXiv: 2510.02766Current news commenting systems are designed based on implicitly individualistic assumptions, where discussion is the result of a series of
news commenting systems distributed roles collective discourse online news collaborative discussion public sphere moderation clustering summarization threading human-computer interaction social computing user engagement perspective diversity argument strength emotional expression politeness mixed-methods evaluation browser extension crowd-based moderation deliberative democracy gpt-3.5-turbo participatory journalism information architecture structured workflowscs.HC -
ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools
Yang Liu, Shaofeng Yin, Ting Lei
March 05, 2026
cs.AI arXiv: 2508.03284Integrating external tools into Large Foundation Models (LFMs) has emerged as a promising approach to enhance their problem-solving
toolvqa visual question answering multi-step reasoning external tools large foundation models toolengine data generation pipeline depth-first search in-context example matching longest common subsequence multimodal dataset llava-7b tool-augmented vqa out-of-distribution benchmarks instruction fine-tuning tool-use agent vision-language models implicit reasoning real-world visual contexts argument prediction answer summarization tool selection gpt-3.5-turbo image-guided dfs multimodal tool-usecs.AI -
Large Language Models are Contrastive Reasoners
Liang Yao
March 06, 2026
cs.CL arXiv: 2403.08211Prompting methods play a crucial role in enhancing the capabilities of pre-trained large language models (LLMs). We explore how contrastive
large language models contrastive prompting complex reasoning zero-shot prompting chain-of-thought arithmetic reasoning commonsense reasoning symbolic reasoning gpt-4 gpt-3.5-turbo gsm8k aqua-rat two-stage prompting self-awareness error avoidance prompt engineering self-consistency few-shot prompting strategyqa svamp multiarith contrastive reasoners instruction tuning reinforcement learning from human feedback trigger sentencecs.CL cs.AI -
Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information
Yoshiki Tanaka, Takumasa Kaneko, Hiroki Onozeki, Natsumi Ezure, Ryuichi Uehara, Zhiyang Qi, Tomoya Higuchi, Ryutaro Asahara, Michimasa Inaba
March 10, 2026
cs.CL arXiv: 2603.07111The Werewolf Game is a communication game where players' reasoning and discussion skills are essential. In this study, we present a Werewolf AI
werewolf game large language models dialogue summarization persona information aiwolfdial 2024 multi-agent systems conversational ai reasoning capabilities contextual consistency chain-of-thought prompting incomplete information game natural language generation agent architecture prompt engineering game theory in-context learning speech strategies characterization gpt-4 gpt-3.5-turbo social deduction games dialogue history compression role-playing decision making human-computer interactioncs.CL cs.AI -
Not All Queries Need Deep Thought: CoFiCot for Adaptive Coarse-to-fine Stateful Refinement
Dongxu Zhang, Hongqiang Lin, Yiding Sun, Pengyu Wang, Qirui Wang, Ning Yang, Jihua Zhu
March 10, 2026
cs.CL arXiv: 2603.08251Scaling test-time computation enhances LLM reasoning ability but faces a uniform computation paradox. Allocating identical resources leads to over-correction on simple
large language models test-time computation chain of thought adaptive computation coarse-to-fine refinement stateful sequential correction process reward models outcome reward models error localization semantic entropy consensus reliability reasoning depth iterative refinement uniform computation paradox self-correction mathematical reasoning commonsense reasoning gsm8k math dataset mmlu llama-3 gpt-3.5-turbo multi-metric classifier inference strategies token efficiencycs.CL -
Reactive Writers: How Co-Writing with AI Changes How We Engage with Ideas
Mor Naaman, Marianne Aubin Le Quéré, Maurice Jakesch, Advait Bhat
March 12, 2026
cs.HC arXiv: 2603.10374Emerging experimental evidence shows that writing with AI assistance can change both the views people express in writing and the opinions they hold afterwards. Yet, we lack
human-ai co-writing ai writing assistants large language models gpt-3 gpt-3.5-turbo gpt-4o mixed-methods study retrospective verbal protocol interviews interaction log analysis topic modeling bertopic hierarchical agglomerative clustering t-sne visualization algorithmic agenda-setting ai persuasion opinion change cognitive offloading automation bias default effect processing fluency reactive writing inline autocomplete human-centered computing social media discourse human-ai interactioncs.HC cs.AI -
SlovKE: A Large-Scale Dataset and LLM Evaluation for Slovak Keyphrase Extraction
Marek Šuppa, David Števaňák
March 17, 2026
cs.CL arXiv: 2603.15523Keyphrase extraction for morphologically rich, low-resource languages remains understudied, largely due to the scarcity of suitable evaluation
keyphrase extraction slovak language morphologically rich languages low-resource languages slavic nlp slovke benchmark dataset scientific abstracts large language models llm evaluation keyllm gpt-3.5-turbo yake textrank keybert slovakbert morphological mismatch exact matching partial matching f1 score lemmatization unsupervised learning canonical forms surface forms manual evaluationcs.CL cs.AI -
Biased AI can Influence Political Decision-Making
Jillian Fisher, Shangbin Feng, Robert Aron, Thomas Richardson, Yejin Choi, Daniel W. Fisher, Jennifer Pan, Yulia Tsvetkov, Katharina Reinecke
March 20, 2026
cs.HC arXiv: 2410.06415As modern large language models (LLMs) become integral to everyday tasks, concerns about their inherent biases and their potential impact on human decision-making have
large language models llms partisan bias political decision-making behavioral bias human-ai interaction opinion formation gpt-3.5-turbo prompting prefix identifiers interactive experiments ordinal logistic regression anova framing dimensions persuasion techniques ai education bias mitigation political partisanship budget allocation task topic opinion task political compass test computational social science human cognition public discourse social biascs.HC cs.AI -
Enhancing Team Diversity with Generative AI: A Novel Project Management Framework
Yuming Li, Johnny Chan
April 02, 2026
cs.CY arXiv: 2502.05181This research-in-progress paper presents a new project management framework that utilises GenAI technology. The framework is designed to address the common challenge of uniform team
generative ai project management team diversity genai agents personality traits big five personality model team composition fine-tuning chatgpt gpt-3.5-turbo friendspersona dataset essay dataset sociological patterns team roles academic research projects homogeneous teams transformer models self-attention mechanism natural language processing multi-label annotation binary classification f1 score human-ai collaboration entrepreneurial success synthetic datasetscs.LG cs.AI cs.CY -
Detecting Reference Errors in Scientific Literature with Large Language Models
Tianmai M. Zhang, Neil F. Abernethy
April 03, 2026
cs.CL arXiv: 2411.06101Reference errors, such as citation and quotation errors, are common in scientific papers. Such errors can result in the propagation of inaccurate information, but are difficult and
reference errors quotation errors citation errors large language models gpt gpt-3.5-turbo gpt-4-turbo gpt-4o retrieval-augmented generation fine-tuning expert-annotated dataset statement-reference pairs scientific literature scientific publishing peer review fact-checking academic misconduct scientific claim verification grobid llamaindex hallucination error analysis label accuracy fully substantiated unsubstantiatedcs.CL