人工智能重塑外科学范式的实践与创新

夏锋, 陈孝平

中国实用外科杂志 ›› 2026, Vol. 46 ›› Issue (5) : 614-619.

PDF(1407 KB)
PDF(1407 KB)
中国实用外科杂志 ›› 2026, Vol. 46 ›› Issue (5) : 614-619. DOI: 10.19538/j.cjps.issn1005-2208.2026.05.03
院士论坛

人工智能重塑外科学范式的实践与创新

作者信息 +

Practice and innovation of artificial intelligence shaping the paradigm of surgery

Author information +
文章历史 +

摘要

人工智能(AI)正驱动外科学范式的深刻变革。术前,AI通过深化影像诊断、赋能影像组学预测及动态交互式手术规划,推动决策从经验直觉转向数据驱动。术中,AI作为实时辅助,在计算机视觉导航、手术流程分析、机器人智能化及术中快速诊断等方面增强外科医师的感知、操作与决策能力。术后,基于连续数据流的AI预测模型能实现并发症的早期预警与主动防控,并赋能居家康复。此外,AI通过客观化技能评估、大语言模型知识赋能及高保真模拟,正在转型外科教育模式。但AI的临床转化仍面临临床研究证据不足、数据与算法瓶颈、系统集成困难以及伦理法律监管框架待完善等挑战。未来发展趋势将朝向构建外科基础模型、发展病人数字孪生、建设人机共融的智能手术室等,最终推动临床外科从“工匠手艺”转化为“智能科学”。

Abstract

Artificial intelligence (AI) is driving a profound transformation in the paradigm of surgery. Before the operation, AI promotes decision-making from intuitive experience to data-driven through deepening imaging diagnosis, empowering imaging-based predictive analytics, and providing dynamic interactive surgical planning. During the operation, AI serves as real-time assistance, enhancing surgeons’ perception, operation, and decision-making capabilities in areas such as computer vision navigation, surgical process analysis, robotic intelligence, and intraoperative rapid diagnosis. Post-operatively, AI prediction models based on continuous data streams can achieve early warning and proactive prevention of complications and empower home-based rehabilitation. Additionally, AI is transforming the surgical education model through objective skill assessment, knowledge empowerment by large language models, and high-fidelity simulation. However, the clinical translation of AI still faces core challenges such as insufficient clinical research evidence, data and algorithm bottlenecks, difficulties in system integration, and the need for improved ethical and legal regulatory frameworks. Future trends will involve building surgical foundation models, developing patient digital twins, and constructing human-machine integrated intelligent operating rooms, ultimately reshaping clinical surgery from “artisanal craftsmanship” to “intelligent science”.

关键词

人工智能 / 外科学 / 外科基础模型 / 数字孪生 / 临床转化 / 医工融合

Key words

artificial intelligence / surgery / surgical foundation model / digital twin / clinical translation / medical-engineering integration

引用本文

导出引用
夏锋, 陈孝平. 人工智能重塑外科学范式的实践与创新[J]. 中国实用外科杂志. 2026, 46(5): 614-619 https://doi.org/10.19538/j.cjps.issn1005-2208.2026.05.03
XIA Feng, CHEN Xiao-ping. Practice and innovation of artificial intelligence shaping the paradigm of surgery[J]. Chinese Journal of Practical Surgery. 2026, 46(5): 614-619 https://doi.org/10.19538/j.cjps.issn1005-2208.2026.05.03
中图分类号: R6   

参考文献

[1]
Weiser TG, Haynes AB, Molina G, et al. Estimate of the global volume of surgery in 2012: An assessment supporting improved health outcomes[J]. Lancet, 2015, 385(suppl 2): 11.DOI:10.1016/S0140-6736(15)60806-6.
[2]
Teo ZL, Thirunavukarasu AJ, Elangovan K, et al. Generative artificial intelligence in medicine[J]. Nat Med, 2025, 31(10): 3270-3282.DOI:10.1146/annurev-biodatasci-103123-095332.
The increased capabilities of generative artificial intelligence (AI) have dramatically expanded its possible use cases in medicine. We provide a comprehensive overview of generative AI use cases for clinicians, patients, clinical trial organizers, researchers, and trainees. We then discuss the many challenges—including maintaining privacy and security, improving transparency and interpretability, upholding equity, and rigorously evaluating models—that must be overcome to realize this potential, as well as the open research directions they give rise to.
[3]
Liu X, Faes L, Kale AU, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: A systematic review and meta-analysis[J]. Lancet Digit Health, 2019, 1(6): e271-e297. DOI:10.1016/S2589-7500(19)30123-2.
[4]
Varghese C, Harrison EM, O'Grady G, et al. Artificial intelligence in surgery[J]. Nat Med, 2024, 30(5): 1257-1268. DOI: 10.1038/s41591-024-02970-3.
Artificial intelligence (AI) is rapidly emerging in healthcare, yet applications in surgery remain relatively nascent. Here we review the integration of AI in the field of surgery, centering our discussion on multifaceted improvements in surgical care in the preoperative, intraoperative and postoperative space. The emergence of foundation model architectures, wearable technologies and improving surgical data infrastructures is enabling rapid advances in AI interventions and utility. We discuss how maturing AI methods hold the potential to improve patient outcomes, facilitate surgical education and optimize surgical care. We review the current applications of deep learning approaches and outline a vision for future advances through multimodal foundation models.© 2024. Springer Nature America, Inc.
[5]
Moor M, Banerjee O, Abad ZSH, et al. Foundation models for generalist medical artificial intelligence[J]. Nature, 2023, 616(7956): 259-265. DOI: 10.1038/s41586-023-05881-4.
[6]
Acosta JN, Falcone GJ, Rajpurkar P, et al. Multimodal biomedical AI[J]. Nat Med, 2022, 28(9): 1773-1784. DOI:10.1038/s41591-022-01981-2.
The increasing availability of biomedical data from large biobanks, electronic health records, medical imaging, wearable and ambient biosensors, and the lower cost of genome and microbiome sequencing have set the stage for the development of multimodal artificial intelligence solutions that capture the complexity of human health and disease. In this Review, we outline the key applications enabled, along with the technical and analytical challenges. We explore opportunities in personalized medicine, digital clinical trials, remote monitoring and care, pandemic surveillance, digital twin technology and virtual health assistants. Further, we survey the data, modeling and privacy challenges that must be overcome to realize the full potential of multimodal artificial intelligence in health.© 2022. Springer Nature America, Inc.
[7]
王昊, 赵鹏飞, 吕晗, 等. 数智化时代的医学影像学推动普通外科高质量发展[J]. 中国实用外科杂志, 2026, 46(1):11-14. DOI:10.19538/j.cjps.issn1005-2208.2026.01.03.
[8]
Mckinney SM, Sieniek M, Godbole V, et al. International evaluation of an AI system for breast cancer screening[J]. Nature, 2020, 577(7788): 89-94. DOI: 10.1038/s41586-019-1799-6.
[9]
Ardila D, Kiraly AP, Bharadwaj S, et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography[J]. Nat Med, 2019, 25(6): 954-961. DOI:10.1038/s41591-019-0447-x.
With an estimated 160,000 deaths in 2018, lung cancer is the most common cause of cancer death in the United States. Lung cancer screening using low-dose computed tomography has been shown to reduce mortality by 20-43% and is now included in US screening guidelines. Existing challenges include inter-grader variability and high false-positive and false-negative rates. We propose a deep learning algorithm that uses a patient's current and prior computed tomography volumes to predict the risk of lung cancer. Our model achieves a state-of-the-art performance (94.4% area under the curve) on 6,716 National Lung Cancer Screening Trial cases, and performs similarly on an independent clinical validation set of 1,139 cases. We conducted two reader studies. When prior computed tomography imaging was not available, our model outperformed all six radiologists with absolute reductions of 11% in false positives and 5% in false negatives. Where prior computed tomography imaging was available, the model performance was on-par with the same radiologists. This creates an opportunity to optimize the screening process via computer assistance and automation. While the vast majority of patients remain unscreened, we show the potential for deep learning models to increase the accuracy, consistency and adoption of lung cancer screening worldwide.
[10]
Korfiatis P, Suman G, Patnam NG, et al. Automated artificial intelligence model trained on a large data set can detect pancreas cancer on diagnostic computed tomography scans as well as visually occult preinvasive cancer on prediagnostic computed tomography scans[J]. Gastroenterology, 2023, 165(6): 1533-1546.e4. DOI:10.1053/j.gastro.2023.08.034.
[11]
Bi WL, Hosny A, Schabath MB, et al. Artificial intelligence in cancer imaging: Clinical challenges and applications[J]. CA Cancer J Clin, 2019, 69(2): 127-157. DOI:10.3322/caac.21552.
[12]
Isensee F, Jaeger PF, Kohl SAA, et al. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation[J]. Nat Methods, 2021, 18(2): 203-211.DOI:10.1038/s41592-020-01008-z.
Biomedical imaging is a driver of scientific discovery and a core component of medical care and is being stimulated by the field of deep learning. While semantic segmentation algorithms enable image analysis and quantification in many applications, the design of respective specialized solutions is non-trivial and highly dependent on dataset properties and hardware conditions. We developed nnU-Net, a deep learning-based segmentation method that automatically configures itself, including preprocessing, network architecture, training and post-processing for any new task. The key design choices in this process are modeled as a set of fixed parameters, interdependent rules and empirical decisions. Without manual intervention, nnU-Net surpasses most existing approaches, including highly specialized solutions on 23 public datasets used in international biomedical segmentation competitions. We make nnU-Net publicly available as an out-of-the-box tool, rendering state-of-the-art segmentation accessible to a broad audience by requiring neither expert knowledge nor computing resources beyond standard network training.
[13]
Sadeghi AH, Bakhuis W, Van Staveren L, et al. Virtual reality and artificial intelligence for 3-dimensional planning of lung segmentectomies[J]. JTCVS Tech, 2021, 7: 309-321. DOI:10.1016/j.xjtc.2021.03.016.
[14]
Vernooij JEM, Van Klei WA, Moons KGM, et al. Performance and usability of pre-operative prediction models for 30-day peri-operative mortality risk: A systematic review[J]. Anaesthesia, 2023, 78(5): 607-619. DOI:10.1111/anae.15988.
Estimating pre-operative mortality risk may inform clinical decision-making for peri-operative care. However, pre-operative mortality risk prediction models are rarely implemented in routine clinical practice. High predictive accuracy and clinical usability are essential for acceptance and clinical implementation. In this systematic review, we identified and appraised prediction models for 30-day postoperative mortality in non-cardiac surgical cohorts. PubMed and Embase were searched up to December 2022 for studies investigating pre-operative prediction models for 30-day mortality. We assessed predictive performance in terms of discrimination and calibration. Risk of bias was evaluated using a tool to assess the risk of bias and applicability of prediction model studies. To further inform potential adoption, we also assessed clinical usability for selected models. In all, 15 studies evaluating 10 prediction models were included. Discrimination ranged from a c-statistic of 0.82 (MySurgeryRisk) to 0.96 (extreme gradient boosting machine learning model). Calibration was reported in only six studies. Model performance was highest for the surgical outcome risk tool (SORT) and its external validations. Clinical usability was highest for the surgical risk pre-operative assessment system. The SORT and risk quantification index also scored high on clinical usability. We found unclear or high risk of bias in the development of all models. The SORT showed the best combination of predictive performance and clinical usability and has been externally validated in several heterogeneous cohorts. To improve clinical uptake, full integration of reliable models with sufficient face validity within the electronic health record is imperative.© 2023 The Authors. Anaesthesia published by John Wiley & Sons Ltd on behalf of Association of Anaesthetists.
[15]
Bertsimas D, Dunn J, Velmahos GC, et al. Surgical risk is not linear: Derivation and validation of a novel, user-friendly, and machine-learning-based Predictive Optimal Trees in Emergency Surgery Risk (POTTER) calculator[J]. Ann Surg, 2018, 268(4): 574-583. DOI:10.1097/SLA.0000000000002956.
Most risk assessment tools assume that the impact of risk factors is linear and cumulative. Using novel machine-learning techniques, we sought to design an interactive, nonlinear risk calculator for Emergency Surgery (ES).All ES patients in the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) 2007 to 2013 database were included (derivation cohort). Optimal Classification Trees (OCT) were leveraged to train machine-learning algorithms to predict postoperative mortality, morbidity, and 18 specific complications (eg, sepsis, surgical site infection). Unlike classic heuristics (eg, logistic regression), OCT is adaptive and reboots itself with each variable, thus accounting for nonlinear interactions among variables. An application [Predictive OpTimal Trees in Emergency Surgery Risk (POTTER)] was then designed as the algorithms' interactive and user-friendly interface. POTTER performance was measured (c-statistic) using the 2014 ACS-NSQIP database (validation cohort) and compared with the American Society of Anesthesiologists (ASA), Emergency Surgery Score (ESS), and ACS-NSQIP calculators' performance.Based on 382,960 ES patients, comprehensive decision-making algorithms were derived, and POTTER was created where the provider's answer to a question interactively dictates the subsequent question. For any specific patient, the number of questions needed to predict mortality ranged from 4 to 11. The mortality c-statistic was 0.9162, higher than ASA (0.8743), ESS (0.8910), and ACS (0.8975). The morbidity c-statistics was similarly the highest (0.8414).POTTER is a highly accurate and user-friendly ES risk calculator with the potential to continuously improve accuracy with ongoing machine-learning. POTTER might prove useful as a tool for bedside preoperative counseling of ES patients and families.
[16]
El Moheb M, Gebran A, Maurer LR, et al. Artificial intelligence versus surgeon gestalt in predicting risk of emergency general surgery[J]. J Trauma Acute Care Surg, 2023, 95(4):8. DOI:10.1097/TA.0000000000004030.
[17]
Mascagni P, Vardazaryan A, Alapatt D, et al. Artificial intelligence for surgical safety: Automatic assessment of the critical view of safety in laparoscopic cholecystectomy using deep learning[J]. Ann Surg, 2022, 275(5): 955-961. DOI:10.1097/SLA.0000000000004351.
To develop a deep learning model to automatically segment hepatocystic anatomy and assess the criteria defining the critical view of safety (CVS) in laparoscopic cholecystectomy (LC).
[18]
Hashimoto DA, Rosman G, Witkowski ER, et al. Computer vision analysis of intraoperative video: Automated recognition of operative steps in laparoscopic sleeve gastrectomy[J]. Ann Surg, 2019, 270(3): 414-421. DOI:10.1097/SLA.0000000000003460.
To develop and assess AI algorithms to identify operative steps in laparoscopic sleeve gastrectomy (LSG).Computer vision, a form of artificial intelligence (AI), allows for quantitative analysis of video by computers for identification of objects and patterns, such as in autonomous driving.Intraoperative video from LSG from an academic institution was annotated by 2 fellowship-trained, board-certified bariatric surgeons. Videos were segmented into the following steps: 1) port placement, 2) liver retraction, 3) liver biopsy, 4) gastrocolic ligament dissection, 5) stapling of the stomach, 6) bagging specimen, and 7) final inspection of staple line. Deep neural networks were used to analyze videos. Accuracy of operative step identification by the AI was determined by comparing to surgeon annotations.Eighty-eight cases of LSG were analyzed. A random 70% sample of these clips was used to train the AI and 30% to test the AI's performance. Mean concordance correlation coefficient for human annotators was 0.862, suggesting excellent agreement. Mean (±SD) accuracy of the AI in identifying operative steps in the test set was 82% ± 4% with a maximum of 85.6%.AI can extract quantitative surgical data from video with 85.6% accuracy. This suggests operative video could be used as a quantitative data source for research in intraoperative clinical decision support, risk prediction, or outcomes studies.
[19]
Saeidi H, Opfermann JD, Kam M, et al. Autonomous robotic laparoscopic surgery for intestinal anastomosis[J]. Sci Robot, 2022, 7(62): eabj2908. DOI: 10.1126/scirobotics.abj2908.
Autonomous robotic surgery has the potential to provide efficacy, safety, and consistency independent of individual surgeon’s skill and experience. Autonomous anastomosis is a challenging soft-tissue surgery task because it requires intricate imaging, tissue tracking, and surgical planning techniques, as well as a precise execution via highly adaptable control strategies often in unstructured and deformable environments. In the laparoscopic setting, such surgeries are even more challenging because of the need for high maneuverability and repeatability under motion and vision constraints. Here we describe an enhanced autonomous strategy for laparoscopic soft tissue surgery and demonstrate robotic laparoscopic small bowel anastomosis in phantom and in vivo intestinal tissues. This enhanced autonomous strategy allows the operator to select among autonomously generated surgical plans and the robot executes a wide range of tasks independently. We then use our enhanced autonomous strategy to perform in vivo autonomous robotic laparoscopic surgery for intestinal anastomosis on porcine models over a 1-week survival period. We compared the anastomosis quality criteria—including needle placement corrections, suture spacing, suture bite size, completion time, lumen patency, and leak pressure—of the developed autonomous system, manual laparoscopic surgery, and robot-assisted surgery (RAS). Data from a phantom model indicate that our system outperforms expert surgeons’ manual technique and RAS technique in terms of consistency and accuracy. This was also replicated in the in vivo model. These results demonstrate that surgical robots exhibiting high levels of autonomy have the potential to improve consistency, patient outcomes, and access to a standard surgical technique.
[20]
Hatib F, Jian Z, Buddi S, et al. Machine-learning algorithm to predict hypotension based on high-fidelity arterial pressure waveform analysis[J]. Anesthesiology, 2018, 129(4): 663-674. DOI:10.1097/ALN.0000000000002300.
WHAT THIS ARTICLE TELLS US THAT IS NEW: BACKGROUND:: With appropriate algorithms, computers can learn to detect patterns and associations in large data sets. The authors' goal was to apply machine learning to arterial pressure waveforms and create an algorithm to predict hypotension. The algorithm detects early alteration in waveforms that can herald the weakening of cardiovascular compensatory mechanisms affecting preload, afterload, and contractility.The algorithm was developed with two different data sources: (1) a retrospective cohort, used for training, consisting of 1,334 patients' records with 545,959 min of arterial waveform recording and 25,461 episodes of hypotension; and (2) a prospective, local hospital cohort used for external validation, consisting of 204 patients' records with 33,236 min of arterial waveform recording and 1,923 episodes of hypotension. The algorithm relates a large set of features calculated from the high-fidelity arterial pressure waveform to the prediction of an upcoming hypotensive event (mean arterial pressure < 65 mmHg). Receiver-operating characteristic curve analysis evaluated the algorithm's success in predicting hypotension, defined as mean arterial pressure less than 65 mmHg.Using 3,022 individual features per cardiac cycle, the algorithm predicted arterial hypotension with a sensitivity and specificity of 88% (85 to 90%) and 87% (85 to 90%) 15 min before a hypotensive event (area under the curve, 0.95 [0.94 to 0.95]); 89% (87 to 91%) and 90% (87 to 92%) 10 min before (area under the curve, 0.95 [0.95 to 0.96]); 92% (90 to 94%) and 92% (90 to 94%) 5 min before (area under the curve, 0.97 [0.97 to 0.98]).The results demonstrate that a machine-learning algorithm can be trained, with large data sets of high-fidelity arterial waveforms, to predict hypotension in surgical patients' records.
[21]
Hollon TC, Pandian B, Adapa AR, et al. Near real-time intraoperative brain tumor diagnosis using stimulated Raman histology and deep neural networks[J]. Nat Med, 2020, 26(1): 52-58. DOI:10.1038/s41591-019-0715-9.
Intraoperative diagnosis is essential for providing safe and effective care during cancer surgery. The existing workflow for intraoperative diagnosis based on hematoxylin and eosin staining of processed tissue is time, resource and labor intensive. Moreover, interpretation of intraoperative histologic images is dependent on a contracting, unevenly distributed, pathology workforce. In the present study, we report a parallel workflow that combines stimulated Raman histology (SRH), a label-free optical imaging method and deep convolutional neural networks (CNNs) to predict diagnosis at the bedside in near real-time in an automated fashion. Specifically, our CNNs, trained on over 2.5 million SRH images, predict brain tumor diagnosis in the operating room in under 150 s, an order of magnitude faster than conventional techniques (for example, 20-30 min). In a multicenter, prospective clinical trial (n = 278), we demonstrated that CNN-based diagnosis of SRH images was noninferior to pathologist-based interpretation of conventional histologic images (overall accuracy, 94.6% versus 93.9%). Our CNNs learned a hierarchy of recognizable histologic feature representations to classify the major histopathologic classes of brain tumors. In addition, we implemented a semantic segmentation method to identify tumor-infiltrated diagnostic regions within SRH images. These results demonstrate how intraoperative cancer diagnosis can be streamlined, creating a complementary pathway for tissue diagnosis that is independent of a traditional pathology laboratory.
[22]
Persson I, Macura A, Becedas D, et al. Early prediction of sepsis in intensive care patients using the machine learning algorithm NAVOY® Sepsis, a prospective randomized clinical validation study[J]. J Crit Care, 2024, 80: 154400. DOI:10.1016/j.jcrc.2023.154400.
[23]
Tomasev N, Glorot X, Rae JW, et al. A clinically applicable approach to continuous prediction of future acute kidney injury[J]. Nature, 2019, 572(7767): 116-119. DOI:10.1038/s41586-019-1390-1.
[24]
Bihorac A, Ozrazgat-Baslanti T, Ebihara G, et al. MySurgeryRisk: Development and validation of a machine-learning risk algorithm for major complications and death after surgery[J]. Ann Surg, 2019, 269(4): 652-662. DOI:10.1097/SLA.0000000000002706.
To accurately calculate the risk for postoperative complications and death after surgery in the preoperative period using machine-learning modeling of clinical data.Postoperative complications cause a 2-fold increase in the 30-day mortality and cost, and are associated with long-term consequences. The ability to precisely forecast the risk for major complications before surgery is limited.In a single-center cohort of 51,457 surgical patients undergoing major inpatient surgery, we have developed and validated an automated analytics framework for a preoperative risk algorithm (MySurgeryRisk) that uses existing clinical data in electronic health records to forecast patient-level probabilistic risk scores for 8 major postoperative complications (acute kidney injury, sepsis, venous thromboembolism, intensive care unit admission >48 hours, mechanical ventilation >48 hours, wound, neurologic, and cardiovascular complications) and death up to 24 months after surgery. We used the area under the receiver characteristic curve (AUC) and predictiveness curves to evaluate model performance.MySurgeryRisk calculates probabilistic risk scores for 8 postoperative complications with AUC values ranging between 0.82 and 0.94 [99% confidence intervals (CIs) 0.81-0.94]. The model predicts the risk for death at 1, 3, 6, 12, and 24 months with AUC values ranging between 0.77 and 0.83 (99% CI 0.76-0.85).We constructed an automated predictive analytics framework for machine-learning algorithm with high discriminatory ability for assessing the risk of surgical complications and death using readily available preoperative electronic health records data. The feasibility of this novel algorithm implemented in real time clinical workflow requires further testing.
[25]
Han L, Char DS, Aghaeepour N, et al. Artificial intelligence in perioperative care: Opportunities and challenges[J]. Anesthesiology, 2024, 141(2): 379-387. DOI:10.1097/ALN.0000000000005013.
[26]
Kiyasseh D, Ma R, Haque TF, et al. A vision transformer for decoding surgeon activity from surgical videos[J]. Nat Biomed Eng, 2023, 7(6): 780-796. DOI:10.1038/s41551-023-01010-8.
The intraoperative activity of a surgeon has substantial impact on postoperative outcomes. However, for most surgical procedures, the details of intraoperative surgical actions, which can vary widely, are not well understood. Here we report a machine learning system leveraging a vision transformer and supervised contrastive learning for the decoding of elements of intraoperative surgical activity from videos commonly collected during robotic surgeries. The system accurately identified surgical steps, actions performed by the surgeon, the quality of these actions and the relative contribution of individual video frames to the decoding of the actions. Through extensive testing on data from three different hospitals located in two different continents, we show that the system generalizes across videos, surgeons, hospitals and surgical procedures, and that it can provide information on surgical gestures and skills from unannotated videos. Decoding intraoperative activity via accurate machine learning systems could be used to provide surgeons with feedback on their operating skills, and may allow for the identification of optimal surgical behaviour and for the study of relationships between intraoperative factors and postoperative outcomes.
[27]
Thirunavukarasu AJ, Ting DSJ, Elangovan K, et al. Large language models in medicine[J]. Nat Med, 2023, 29(8): 1930-1940. DOI:10.1038/s41591-023-02448-8.
Large language models (LLMs) can respond to free-text queries without being specifically trained in the task in question, causing excitement and concern about their use in healthcare settings. ChatGPT is a generative artificial intelligence (AI) chatbot produced through sophisticated fine-tuning of an LLM, and other tools are emerging through similar developmental processes. Here we outline how LLM applications such as ChatGPT are developed, and we discuss how they are being leveraged in clinical settings. We consider the strengths and limitations of LLMs and their potential to improve the efficiency and effectiveness of clinical, educational and research work in medicine. LLM chatbots have already been deployed in a range of biomedical contexts, with impressive but mixed results. This review acts as a primer for interested clinicians, who will determine if and how LLM technology is used in healthcare for the benefit of patients and practitioners.© 2023. Springer Nature America, Inc.
[28]
Moglia A, Ferrari V, Morelli L, et al. A systematic review of virtual reality simulators for robot-assisted surgery[J]. Eur Urol, 2016, 69(6): 1065-1080. DOI:10.1016/j.eururo.2015.09.021.
No single large published randomized controlled trial (RCT) has confirmed the efficacy of virtual simulators in the acquisition of skills to the standard required for safe clinical robotic surgery. This remains the main obstacle for the adoption of these virtual simulators in surgical residency curricula.To evaluate the level of evidence in published studies on the efficacy of training on virtual simulators for robotic surgery.In April 2015 a literature search was conducted on PubMed, Web of Science, Scopus, Cochrane Library, the Clinical Trials Database (US) and the Meta Register of Controlled Trials. All publications were scrutinized for relevance to the review and for assessment of the levels of evidence provided using the classification developed by the Oxford Centre for Evidence-Based Medicine.The publications included in the review consisted of one RCT and 28 cohort studies on validity, and seven RCTs and two cohort studies on skills transfer from virtual simulators to robot-assisted surgery. Simulators were rated good for realism (face validity) and for usefulness as a training tool (content validity). However, the studies included used various simulation training methodologies, limiting the assessment of construct validity. The review confirms the absence of any consensus on which tasks and metrics are the most effective for the da Vinci Skills Simulator and dV-Trainer, the most widely investigated systems. Although there is consensus for the RoSS simulator, this is based on only two studies on construct validity involving four exercises. One study on initial evaluation of an augmented reality module for partial nephrectomy using the dV-Trainer reported high correlation (r=0.8) between in vivo porcine nephrectomy and a virtual renorrhaphy task according to the overall Global Evaluation Assessment of Robotic Surgery (GEARS) score. In one RCT on skills transfer, the experimental group outperformed the control group, with a significant difference in overall GEARS score (p=0.012) during performance of urethrovesical anastomosis on an inanimate model. Only one study included assessment of a surgical procedure on real patients: subjects trained on a virtual simulator outperformed the control group following traditional training. However, besides the small numbers, this study was not randomized.There is an urgent need for a large, well-designed, preferably multicenter RCT to study the efficacy of virtual simulation for acquisition competence in and safe execution of clinical robotic-assisted surgery.We reviewed the literature on virtual simulators for robot-assisted surgery. Validity studies used various simulation training methodologies. It is not clear which exercises and metrics are the most effective in distinguishing different levels of experience on the da Vinci robot. There is no reported evidence of skills transfer from simulation to clinical surgery on real patients.Copyright © 2015 European Association of Urology. Published by Elsevier B.V. All rights reserved.
[29]
Han R, Munro T, Burns EM, et al. Randomized controlled trials evaluating ai in clinical practice: A scoping evaluation[J]. Lancet Digit Health, 2023, 5(9): e611-e622. DOI:10.1101/2023.09.12.23295381.
[30]
窦科峰, 许皓. 移植外科的范式变革: 迈向数智化新征程[J]. 中国实用外科杂志, 2026, 46(1):1-5. DOI:10.19538/j.cjps.issn1005-2208.2026.01.01.
[31]
李博文, 肖琼, 宗瑞刚, 等. 人工智能助力胃肠肿瘤诊治的新进展[J]. 中国实用外科杂志, 2026, 46(1):41-45. DOI:10.19538/j.cjps.issn1005-2208.2026.01.09.
[32]
Wijnberge M, Geerts BF, Hol L, et al. Effect of a machine learning-derived early warning system for intraoperative hypotension vs standard care on depth and duration of intraoperative hypotension during elective noncardiac surgery: The hype randomized clinical trial[J]. JAMA, 2020, 323(11): 1052-1060. DOI:10.1001/JAMA.2020.0592.
Intraoperative hypotension is associated with increased morbidity and mortality. A machine learning-derived early warning system to predict hypotension shortly before it occurs has been developed and validated.To test whether the clinical application of the early warning system in combination with a hemodynamic diagnostic guidance and treatment protocol reduces intraoperative hypotension.Preliminary unblinded randomized clinical trial performed in a tertiary center in Amsterdam, the Netherlands, among adult patients scheduled for elective noncardiac surgery under general anesthesia and an indication for continuous invasive blood pressure monitoring, who were enrolled between May 2018 and March 2019. Hypotension was defined as a mean arterial pressure (MAP) below 65 mm Hg for at least 1 minute.Patients were randomly assigned to receive either the early warning system (n = 34) or standard care (n = 34), with a goal MAP of at least 65 mm Hg in both groups.The primary outcome was time-weighted average of hypotension during surgery, with a unit of measure of millimeters of mercury. This was calculated as the depth of hypotension below a MAP of 65 mm Hg (in millimeters of mercury) × time spent below a MAP of 65 mm Hg (in minutes) divided by total duration of operation (in minutes).Among 68 randomized patients, 60 (88%) completed the trial (median age, 64 [interquartile range {IQR}, 57-70] years; 26 [43%] women). The median length of surgery was 256 minutes (IQR, 213-430 minutes). The median time-weighted average of hypotension was 0.10 mm Hg (IQR, 0.01-0.43 mm Hg) in the intervention group vs 0.44 mm Hg (IQR, 0.23-0.72 mm Hg) in the control group, for a median difference of 0.38 mm Hg (95% CI, 0.14-0.43 mm Hg; P = .001). The median time of hypotension per patient was 8.0 minutes (IQR, 1.33-26.00 minutes) in the intervention group vs 32.7 minutes (IQR, 11.5-59.7 minutes) in the control group, for a median difference of 16.7 minutes (95% CI, 7.7-31.0 minutes; P < .001). In the intervention group, 0 serious adverse events resulting in death occurred vs 2 (7%) in the control group.In this single-center preliminary study of patients undergoing elective noncardiac surgery, the use of a machine learning-derived early warning system compared with standard care resulted in less intraoperative hypotension. Further research with larger study populations in diverse settings is needed to understand the effect on additional patient outcomes and to fully assess safety and generalizability.ClinicalTrials.gov Identifier: NCT03376347.
[33]
Rieke N, Hancox J, Li W, et al. The future of digital health with federated learning[J]. NPJ Digit Med, 2020, 3: 119. DOI:10.1038/s41746-020-00323-1.
Data-driven machine learning (ML) has emerged as a promising approach for building accurate and robust statistical models from medical data, which is collected in huge volumes by modern healthcare systems. Existing medical data is not fully exploited by ML primarily because it sits in data silos and privacy concerns restrict access to this data. However, without access to sufficient data, ML will be prevented from reaching its full potential and, ultimately, from making the transition from research to clinical practice. This paper considers key factors contributing to this issue, explores how federated learning (FL) may provide a solution for the future of digital health and highlights the challenges and considerations that need to be addressed.© The Author(s) 2020.
[34]
Brimacombe M. Data flow-based strategies to improve the interpretation and understanding of machine learning models[J]. Bioengineering (Basel), 2024, 11(12): 1189. DOI:10.3390/bioengineering11121189.
[35]
Food And Drug Administration. Artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD) action plan[R/OL]. (2021-01-12)[2026-03-10]. https://www.fda.gov/media/145022/download.

脚注

利益冲突 所有作者均声明不存在利益冲突

基金

重庆市自然科学基金项目(CSTB2023NSCQ-MSX0563)

PDF(1407 KB)

Accesses

Citation

Detail

段落导航
相关文章

/