[자료구조] Trie

lnacles 2023. 8. 1. 15:57

트라이 자료구조란 단어나 문자를 검색할 때 사용이 됩니다. 그렇다고 Hash를 쓰면 되는데 왜 트라이 자료구조를 사용하는가? 문자열은 데이터 사이즈가 숫자에 비해 크다 1억 개의 데이터를 HashMap으로 구현한다면 어마어마한 공간이 필요할 것입니다. 메모리를 줄여 문자열 간에 중복되는 문자를 최적화하는 것은 트라이의 목적입니다.

트라이는 Key Value로 트리 자료구조를 구현하며 문자열 길이가 m일 때 O(m)의 시간복잡도를 가지게 됩니다.

아래 [그림 1]은 apple 과 banana 문자를 트라이 자료구조로 구현했을 때입니다.

만약 단어를 추가하는데 이미 존재하는 단어 일 땐 다음과 같이 T/F로 표기합니다.

apple 단어가 추가되어있을 때 app을 추가하면 apple의 e와 app의 마지막 p의 flag로 표현할 수 있습니다. [그림 2] 참고

apple과 유사한 apply단어가 추가되면 key value에서 key(l)에 value (e, y)가 추가되는 메커니즘입니다.

트라이에 문자열을 삭제하는 기능도 존재합니다. 문자열이 추가될 때 기존에 없던 문자열이 추가되는 방법과 기존에 존재하는 문자열의 일부분임을 주의해야 합니다. value로 key를 확정할 수 없으니 루트에서부터 재귀적으로 구현했습니다. 아래는 트라이 자료구조를 구현해 봤습니다.

public class Trie {
	class TrieNode{
		Map<Character, TrieNode> child = new HashMap<Character, Trie.TrieNode>();
		boolean word;
		public TrieNode() {
			
		}
		public TrieNode(Map<Character, TrieNode> child, boolean word) {
			this.child = child;
			this.word = word;
		}		
	}
    
	TrieNode rootNode;
    
	public Trie() {
		rootNode = new TrieNode(); 
	}
    
	void add(String word) {
		TrieNode node = this.rootNode;
		for (int i = 0; i < word.length(); i++) {
			node = node.child.computeIfAbsent(word.charAt(i), key->new TrieNode());
		}
		node.word = true;
	}
    
	boolean contains(String word) {
		TrieNode node = this.rootNode;
		for (int i = 0; i < word.length(); i++) {
			if(!node.child.containsKey(word.charAt(i))) return false;
			node = node.child.get(word.charAt(i));
		}
		return node.word;
	}
    
	void delete(String word) {
		delete(this.rootNode,word,0);
	}
    
	private void delete(TrieNode node, String word, int i) {
		if(i == word.length()) {
			if(!node.word) throw new Error("존재 하지 않습니다.");
			node.word = false;
			return ;
		}
		if(!node.child.containsKey(word.charAt(i))) throw new Error("존재 하지 않습니다.")	;
		TrieNode c = node.child.get(word.charAt(i));
		delete(c,word,i+1);
		if(c.child.isEmpty() && !c.word) 
			node.child.remove(word.charAt(i), c);
	}
    
}