1.1 Is Unique

Created: August 22, 2021 1:13 AM Tags: array, ascii, bit vector, string

1.1 Is Unique : Implement an algorithm to determine if a string has all unique characters. What if you cannot use additional data structures?

Answer) If the string is an ASCII string, I can determine if the string has all unique characters or not by using an array of boolean values, where the flag at index i indicates whether character i in the alphabet is contained in the string. The second time the character is found I can immediately return False.

  ################Solution 1 : 아스키코드와 boolean flag 배열 이용##########
  def isUniqueChars(s):
      if len(s) > 128:
          return False #if string is longer than the number of ASCII code, return False
      char_set = [False for i in range(128)]
      for i in range(len(s)):
          ascii_val = ord(s[i])
          if(char_set[ascii_val]): #이미 한개 존재하면
              return False
          char_set[ascii_val] = True
      return True

Also, if I cannot use additional data structures, I can compare every character of the string to every other character of the string as below.

  ################Solution 2 : 전체 탐색 O(n2)##########
  def isUniqueCharsNoDS(s):
      for i in range(len(s)):
          ch = s[i]
          for j in range(i+1, len(s)):
              if ch == s[j]:
                  return False
      return True

Otherwise, I can sort the string first, and then linearly check the string for neighboring characters that are identical as below.

  ################Solution 3 : Sorting 후 이웃 비교 O(n*logn)##########
  def isUniqueCharsSorting(s):
      str = sorted(s)
      for i in range(1, len(str)):
          if str[i-1] == str[i]:
              return False
      return True

답변) string이 아스키 문자열이라고 가정했을 때, 길이 128의 flag를 갖는 Boolean 배열을 이용하여 문자열이 all unique characters를 갖는지 확인할 수 있을 것 같습니다. 문자열의 길이가 아스키코드의 전체 개수 128보다 크다면 무조건 하나 이상의 중복 문자를 가질 것이고, 128이하일 때 문자열이 각 문자를 갖는지에 대해 flag를 만들어 두번 발견되면 즉시 False를 리턴하여 해결할 수 있을 것 입니다. 아래와 같이 구현하였습니다.

추가로, 만약 additional data structures를 이용할 수 없다면 문자열의 전체 문자를 중복이 존재하는지 전체 탐색할 수 있습니다.

혹은 문자열의 문자들을 정렬한 후 이웃 간에 같은 문자가 있다면 중복 문자가 존재하는 것으로 판단할 수 있습니다. 이때, 전체 탐색 O(n2)에 비해 O(n*logn) 으로 시간복잡도는 낮아질 것 입니다.

추가 답변) 파이썬의 set을 이용하여도 해결 가능

⇒ set으로 중복된 데이터를 날린 후 날라간 데이터가 있다면 중복된 문자가 있다고 판단 가능.

  def isUnique(s):
      str = set(s)
      arr_s = [ch for ch in s]

      str = sorted(str)
      arr_s = sorted(arr_s)

      if str != arr_s:
          return False
      else:
          return True

추가 답안) bit vector를 이용하여 space usage를 더 적게 사용할 수도 있다.

  boolean isUniqueChars(String str) {
    int checker= 0;
    for (int i= 0; i < str.length(); i++) {
    int val= str.charAt(i) - 'a';
    if ((checker & (1 << val)) > 0) {
      return false;
    }
    checker |= (1 << val);
    }
     return true;
  }