Java 如何在字母和数字之间(或在数字和字母之间)拆分字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8270784/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-15 00:29:23  来源:igfitidea点击:

How to split a string between letters and digits (or between digits and letters)?

javaregexstring

提问by The Crazy Chimp

I'm trying to work out a way of splitting up a string in java that follows a pattern like so:

我正在尝试找出一种在 java 中拆分字符串的方法,该字符串遵循这样的模式:

String a = "123abc345def";

The results from this should be the following:

这样做的结果应该如下:

x[0] = "123";
x[1] = "abc";
x[2] = "345";
x[3] = "def";

However I'm completely stumped as to how I can achieve this. Please can someone help me out? I have tried searching online for a similar problem, however it's very difficult to phrase it correctly in a search.

但是,我完全不知道如何实现这一目标。请问有人可以帮我吗?我曾尝试在线搜索类似的问题,但是很难在搜索中正确表达。

Please note:The number of letters & numbers may vary (e.g. There could be a string like so '1234a5bcdef')

请注意:字母和数字的数量可能会有所不同(例如,可能会有像这样的字符串 '1234a5bcdef')

采纳答案by Qtax

You could try to split on (?<=\D)(?=\d)|(?<=\d)(?=\D), like:

您可以尝试拆分(?<=\D)(?=\d)|(?<=\d)(?=\D),例如:

str.split("(?<=\D)(?=\d)|(?<=\d)(?=\D)");

It matches positions between a number and not-a-number (in any order).

它匹配数字和非数字之间的位置(以任何顺序)。

  • (?<=\D)(?=\d)- matches a position between a non-digit (\D) and a digit (\d)
  • (?<=\d)(?=\D)- matches a position between a digit and a non-digit.
  • (?<=\D)(?=\d)- 匹配非数字 ( \D) 和数字 ( \d)之间的位置
  • (?<=\d)(?=\D)- 匹配数字和非数字之间的位置。

回答by mishadoff

Use two different patterns: [0-9]*and [a-zA-Z]*and split twice by each of them.

使用两种不同的模式:[0-9]*[a-zA-Z]*并由每个模式拆分两次。

回答by Mario

Didn't use Java for ages, so just some pseudo code, that should help get you started (faster for me than looking up everything :) ).

很久没有使用 Java,所以只是一些伪代码,应该可以帮助您入门(对我来说比查找所有内容更快 :))。

 string a = "123abc345def";
 string[] result;
 while(a.Length > 0)
 {
      string part;
      if((part = a.Match(/\d+/)).Length) // match digits
           ;
      else if((part = a.Match(/\a+/)).Length) // match letters
           ;
      else
           break; // something invalid - neither digit nor letter
      result.append(part);
      a = a.SubStr(part.Length - 1); // remove the part we've found
 }

回答by nullpotent

How about:

怎么样:

private List<String> Parse(String str) {
    List<String> output = new ArrayList<String>();
    Matcher match = Pattern.compile("[0-9]+|[a-z]+|[A-Z]+").matcher(str);
    while (match.find()) {
        output.add(match.group());
    }
    return output;
}

回答by Th? Anh Nguy?n

You can try this:

你可以试试这个:

Pattern p = Pattern.compile("[a-z]+|\d+");
Matcher m = p.matcher("123abc345def");
ArrayList<String> allMatches = new ArrayList<>();
while (m.find()) {
    allMatches.add(m.group());
}

The result (allMatches) will be:

结果 (allMatches) 将是:

["123", "abc", "345", "def"]

回答by sergeyan

If you are looking for solution without using Java Stringfunctionality (i.e. split, match, etc.) then the following should help:

如果您正在寻找解决方案,而使用JavaString的功能(即splitmatch等),那么以下应该有所帮助:

List<String> splitString(String string) {
        List<String> list = new ArrayList<String>();
        String token = "";
        char curr;
        for (int e = 0; e < string.length() + 1; e++) {
            if (e == 0)
                curr = string.charAt(0);
            else {
                curr = string.charAt(--e);
            }

            if (isNumber(curr)) {
                while (e < string.length() && isNumber(string.charAt(e))) {
                    token += string.charAt(e++);
                }
                list.add(token);
                token = "";
            } else {
                while (e < string.length() && !isNumber(string.charAt(e))) {
                    token += string.charAt(e++);
                }
                list.add(token);
                token = "";
            }

        }

        return list;
    }

boolean isNumber(char c) {
        return c >= '0' && c <= '9';
    }

This solution will split numbers and 'words', where 'words' are strings that don't contain numbers. However, if you like to have only 'words' containing English letters then you can easily modify it by adding more conditions (like isNumbermethod call) depending on your requirements (for example you may wish to skip words that contain non English letters). Also note that the splitStringmethod returns ArrayListwhich later can be converted to Stringarray.

此解决方案将拆分数字和“单词”,其中“单词”是不包含数字的字符串。但是,如果您只喜欢包含英文字母的“单词”,那么您可以isNumber根据您的要求通过添加更多条件(如方法调用)轻松修改它(例如,您可能希望跳过包含非英文字母的单词)。另请注意,该splitString方法返回ArrayList,稍后可以转换为String数组。

回答by Tatarize

I was doing this sort of thing for mission critical code. Like every fraction of a second counts because I need to process 180k entries in an unnoticeable amount of time. So I skipped the regex and split altogether and allowed for inline processing of each element (though adding them to an ArrayList<String>would be fine). If you want to do this exact thing but need it to be something like 20x faster...

我正在为关键任务代码做这种事情。就像每一分之一秒都很重要,因为我需要在不明显的时间内处理 180k 个条目。所以我跳过了正则表达式并完全拆分并允许对每个元素进行内联处理(尽管将它们添加到 anArrayList<String>会很好)。如果你想做这个确切的事情但需要它快20倍......

void parseGroups(String text) {
    int last = 0;
    int state = 0;
    for (int i = 0, s = text.length(); i < s; i++) {
        switch (text.charAt(i)) {
            case '0':
            case '1':
            case '2':
            case '3':
            case '4':
            case '5':
            case '6':
            case '7':
            case '8':
            case '9':
                if (state == 2) {
                    processElement(text.substring(last, i));
                    last = i;
                }
                state = 1;
                break;
            default:
                if (state == 1) {
                    processElement(text.substring(last, i));
                    last = i;
                }
                state = 2;
                break;
        }
    }
    processElement(text.substring(last));
}

回答by Andrew Anderson

Wouldn't this "d+|D+"do the job instead of the cumbersome: "(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)"?

"d+|D+"不会完成这项工作而不是繁琐的: "(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)"