Tweak non-English issue detection (#146636)

This commit is contained in:
Franck Nijhof 2025-06-12 19:38:20 +02:00 committed by GitHub
parent 7e6bb021ce
commit e86e793842
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -64,16 +64,19 @@ jobs:
You are a language detection system. Your task is to determine if the provided text is written in English or another language. You are a language detection system. Your task is to determine if the provided text is written in English or another language.
Rules: Rules:
1. Analyze the text and determine the primary language 1. Analyze the text and determine the primary language of the USER'S DESCRIPTION only
2. IGNORE markdown headers (lines starting with #, ##, ###, etc.) as these are from issue templates, not user input 2. IGNORE markdown headers (lines starting with #, ##, ###, etc.) as these are from issue templates, not user input
3. IGNORE all code blocks (text between ``` or ` markers) as they may contain system-generated error messages in other languages 3. IGNORE all code blocks (text between ``` or ` markers) as they may contain system-generated error messages in other languages
4. Consider technical terms, code snippets, and URLs as neutral (they don't indicate non-English) 4. IGNORE error messages, logs, and system output even if not in code blocks - these often appear in the user's system language
5. Focus on the actual sentences and descriptions written by the user 5. Consider technical terms, code snippets, URLs, and file paths as neutral (they don't indicate non-English)
6. Return ONLY a JSON object with two fields: 6. Focus ONLY on the actual sentences and descriptions written by the user explaining their issue
- "is_english": boolean (true if the text is primarily in English, false otherwise) 7. If the user's explanation/description is in English but includes non-English error messages or logs, consider it ENGLISH
8. Return ONLY a JSON object with two fields:
- "is_english": boolean (true if the user's description is primarily in English, false otherwise)
- "detected_language": string (the name of the detected language, e.g., "English", "Spanish", "Chinese", etc.) - "detected_language": string (the name of the detected language, e.g., "English", "Spanish", "Chinese", etc.)
7. Be lenient - if the text is mostly English with minor non-English elements, consider it English 9. Be lenient - if the user's explanation is in English with non-English system output, it's still English
8. Common programming terms, error messages, and technical jargon should not be considered as non-English 10. Common programming terms, error messages, and technical jargon should not be considered as non-English
11. If you cannot reliably determine the language, set detected_language to "undefined"
Example response: Example response:
{"is_english": false, "detected_language": "Spanish"} {"is_english": false, "detected_language": "Spanish"}
@ -122,6 +125,12 @@ jobs:
return; return;
} }
// If language is undefined or not detected, skip processing
if (!languageResult.detected_language || languageResult.detected_language === 'undefined') {
console.log('Language could not be determined, skipping processing');
return;
}
console.log(`Issue detected as non-English: ${languageResult.detected_language}`); console.log(`Issue detected as non-English: ${languageResult.detected_language}`);
// Post comment explaining the language requirement // Post comment explaining the language requirement