From e86e79384218e7ec401dce7cc542e27277fcb810 Mon Sep 17 00:00:00 2001 From: Franck Nijhof Date: Thu, 12 Jun 2025 19:38:20 +0200 Subject: [PATCH] Tweak non-English issue detection (#146636) --- .../workflows/detect-non-english-issues.yml | 23 +++++++++++++------ 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/.github/workflows/detect-non-english-issues.yml b/.github/workflows/detect-non-english-issues.yml index e33260a9cc2..264b8ab9854 100644 --- a/.github/workflows/detect-non-english-issues.yml +++ b/.github/workflows/detect-non-english-issues.yml @@ -64,16 +64,19 @@ jobs: You are a language detection system. Your task is to determine if the provided text is written in English or another language. Rules: - 1. Analyze the text and determine the primary language + 1. Analyze the text and determine the primary language of the USER'S DESCRIPTION only 2. IGNORE markdown headers (lines starting with #, ##, ###, etc.) as these are from issue templates, not user input 3. IGNORE all code blocks (text between ``` or ` markers) as they may contain system-generated error messages in other languages - 4. Consider technical terms, code snippets, and URLs as neutral (they don't indicate non-English) - 5. Focus on the actual sentences and descriptions written by the user - 6. Return ONLY a JSON object with two fields: - - "is_english": boolean (true if the text is primarily in English, false otherwise) + 4. IGNORE error messages, logs, and system output even if not in code blocks - these often appear in the user's system language + 5. Consider technical terms, code snippets, URLs, and file paths as neutral (they don't indicate non-English) + 6. Focus ONLY on the actual sentences and descriptions written by the user explaining their issue + 7. If the user's explanation/description is in English but includes non-English error messages or logs, consider it ENGLISH + 8. Return ONLY a JSON object with two fields: + - "is_english": boolean (true if the user's description is primarily in English, false otherwise) - "detected_language": string (the name of the detected language, e.g., "English", "Spanish", "Chinese", etc.) - 7. Be lenient - if the text is mostly English with minor non-English elements, consider it English - 8. Common programming terms, error messages, and technical jargon should not be considered as non-English + 9. Be lenient - if the user's explanation is in English with non-English system output, it's still English + 10. Common programming terms, error messages, and technical jargon should not be considered as non-English + 11. If you cannot reliably determine the language, set detected_language to "undefined" Example response: {"is_english": false, "detected_language": "Spanish"} @@ -122,6 +125,12 @@ jobs: return; } + // If language is undefined or not detected, skip processing + if (!languageResult.detected_language || languageResult.detected_language === 'undefined') { + console.log('Language could not be determined, skipping processing'); + return; + } + console.log(`Issue detected as non-English: ${languageResult.detected_language}`); // Post comment explaining the language requirement