Leetcode Problem - Alien Dictionary

Problem link: https://leetcode.com/problems/alien-dictionary/
Difficulty: Hard
Category: Graph

Step 1: Grasping the Problem

This is a popular hard problem where you are provided a group of letters (like dictionaries) and you have to determine the order of the characters the dictionary’s words are sorted by.

In one sentence, we need to sort characters given a group of words. So first we need to detect which characters will come first. Of course the first letter in the first word will come first in lexicographic order. But after that, we will need to keep a relationship between character of who comes before who depending on the order of the words.

So, keeping a relationship between objects tells us that this problem could be solved by graph. And we need to make an order of characters where each character comes first when no other character is above that one in the words list. In other words, a character will be picked only when there are no other characters which depend on it. This is the perfect problem to be solved by topological sorting.

Step 2: Build the graph

To solve the problem with topological sorting, we need a graph first. In this graph, the nodes will be characters and the edges will be the relationship stating “this character must come before the other character”.

While building the graph, we will use a built-in datatype called defaultdict. In python, defaultdict is a dictionary where each non-existent key access will return a default value without raising an exception.

We will use a commonly used built-in generator function named zip. What zip does is it zips two container data structures so that one can easily loop through both of that at the same time. Without zip we would have to do something like this:

1
2
3
for word_ind in range(len(words)-1):
    word1 = words[word_ind]
    word2 = words[word_ind + 1]

But with the help of zip, we can just loop over these without taking extra index variable word_ind:

1
for word1, word2 in zip(words, words[1:]):

So first we create a graph of defaultdict type, loop through the array or words, then loop through the characters of that word, and create an edge for each consecutive words.

1
2
3
4
5
6
7
8
# Build graph
graph = defaultdict(set)
for word1, word2 in zip(words, words[1:]):
    for char1, char2 in zip(word1, word2):
        if char1 != char2:
            if char2 not in graph[char1]:
                graph[char1].add(char2)
            break

While building the graph, an optimization can be done for a corner case. What if multiple words are prefixes of one another. Then there will be no way of sorting them in lexicographic order. We can check then only if the char1 != char2 is always False. And then, the for loop will never hit the break statement. We could use a flag to check if the break is hit or not.

Or, we can use another cool python feature of python named for/else. Yes, else block can be used after a for block in python and it does exactly what is written above. After checking this with else, we can return null string if we cannot sort the characters. So the code will be:

1
2
3
4
# Check that second word isn't a prefix of first word.
else:  
    if len(word2) < len(word1):
        return ""

Step 3: Build an `in_degree` dict for topological sorting

According to wikipedia:

For a vertex, the number of head ends adjacent to a vertex is called the indegree of the vertex

Now we need to get to know the in_degree of each node in the graph. Because topological sorting starts (and uses) with the nodes which has in-degrees’ of zero.

First we initialize in_degree with all the characters of words with 0 (this should be done before building the graph so that we can use it while creating the graph)

1
2
3
4
5
# initialize in degree of graph
in_degree = {}
for word in words:
    for char in word:
        in_degree[char] = 0

Finally, we increment a character’s in_degree when we create an edge while building the graph.

1
2
3
if char2 not in graph[char1]:
    graph[char1].add(char2)
    in_degree[char2] += 1 # new line for in_degree

The codes might be looking out of order to you and difficult to understand, so please see the full program given at the end to make sure you understand the order of code snippets used above.

Step 4: Topological Sorting

Topological Sorting is like BFS (Breadth First Search) or DFS (Depth First Search) where instead of starting from a source node, we start from all the nodes that have in_degree[node] == 0. For bfs, we need to use a queue data structure. Python has built-in deque which can be used as a queue with same time and space complexity. Then we push all the nodes that has in_degree[node] == 0 to the queue.

1
2
3
4
5
# Identify vertices that has in_degree = 0
queue = deque()
for char in in_degree:
    if in_degree[char] == 0:
        queue.append(char)

After that, we keep getting nodes from the queue and update all it’s neighboring nodes in_degree. At the same time when popping nodes from the queue, we will append that with out resulting string aliendict.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Do the topological sort
aliendict = ""
while queue:
    node = queue.popleft()
    aliendict += node
    # Loop through neighbors set
    for char in graph[node]:
        in_degree[char] -= 1
        if in_degree[char] == 0:
            queue.append(char)

Finally we will need to check if all the characters are inside the aliendict. If they are not inside the aliendict, then we are sure that at some stage, there was a cycle in topological sorting. So there is no single way of sorting the characters given the words. At that time, we will return a null string.

1
2
3
4
if len(aliendict) < len(in_degree):
    return ""

return aliendict

This is the final solution of the program. Topological sorting takes O(|V| + |E|) time and O(|V|) space complexity where V is the list of vertices (characters for word in words in this problem), and E is the list of edges in the graph (|words| * (characters in each word) in this problem). The complete solution is given below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
from collections import defaultdict, deque

class Solution:
    def alienOrder(self, words: [str]) -> str:
        # initialize in degree of graph
        in_degree = {}
        for word in words:
            for char in word:
                in_degree[char] = 0

        # Build graph
        graph = defaultdict(set)
        for word1, word2 in zip(words, words[1:]):
            for char1, char2 in zip(word1, word2):
                if char1 != char2:
                    if char2 not in graph[char1]:
                        graph[char1].add(char2)
                        in_degree[char2] += 1
                    break
            # Check that second word isn't a prefix of first word.
            else:  
                if len(word2) < len(word1):
                    return ""

        # Identify vertices that has in_degree = 0
        queue = deque()
        for char in in_degree:
            if in_degree[char] == 0:
                queue.append(char)

        # Do the topological sort
        aliendict = ""
        while queue:
            node = queue.popleft()
            aliendict += node
            # Loop through neighbors set
            for char in graph[node]:
                in_degree[char] -= 1
                if in_degree[char] == 0:
                    queue.append(char)

        if len(aliendict) < len(in_degree):
            return ""

        return aliendict

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
class Solution {
public:
    string alienOrder(vector<string> &words) {
        // initialize in degree of graph
        unordered_map<char, int> inDegree;
        for(auto word: words)
            for(auto character: word)
                inDegree[character] = 0;

        // Build graph
        unordered_map<char, vector<char>> graph;
        for(int word_ind = 0; word_ind < words.size() - 1; word_ind++) {
            string word1 = words[word_ind];
            string word2 = words[word_ind + 1];

            bool loopBroken = false;
            for(int char_ind = 0; char_ind < min(word1.length(), word2.length()); char_ind++) {
                char char1 = word1[char_ind];
                char char2 = word2[char_ind];
                if(char1 != char2) {
                    if(graph.find(char1) == graph.end()) {
                        graph[char1].push_back(char2);
                        inDegree[char2]++;
                    }
                    loopBroken = true;
                    break;
                }
            }
            // Check that second word isn't a prefix of first word
            if(!loopBroken && word2.length() < word1.length())
                return "";
        }

        // Identify vertices that has inDegree = 0
        queue<char> q;
        for(auto charDegreePair: inDegree)
            if(charDegreePair.second == 0)
                q.push(charDegreePair.first);
        
        // Do the topological sort
        string alienDict;
        while(!q.empty()) {
            auto node = q.front();
            q.pop();
            alienDict.push_back(node);
            // Loop through neighbors set
            for(auto character: graph[node]) {
                inDegree[character]--;
                if(inDegree[character] == 0)
                    q.push(character);
            }
        }

        if(alienDict.length() < inDegree.size())
            return "";
        
        return alienDict;
    }
};

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
class Solution {
    public String alienOrder(String[] words) {
        // initialize in degree of graph
        Map<Character, Integer> inDegree = new HashMap<Character, Integer>();
        for(String word : words)
            for(Character character : word.toCharArray())
                inDegree.put(character, 0);

        // Build graph
        Map<Character, List<Character>> graph = new HashMap<Character, List<Character>>();
        for(int word_ind = 0; word_ind < words.length - 1; word_ind++) {
            String word1 = words[word_ind];
            String word2 = words[word_ind + 1];

            boolean loopBroken = false;
            for(int char_ind = 0; char_ind < Math.min(word1.length(), word2.length()); char_ind++) {
                Character char1 = word1.charAt(char_ind);
                Character char2 = word2.charAt(char_ind);
                if(char1 != char2) {
                    if(!graph.getOrDefault(char1, new ArrayList<Character>()).contains(char2)) {
                        List<Character> prevList = graph.getOrDefault(char1, new ArrayList<Character>());
                        prevList.add(char2);
                        graph.put(char1, prevList);
                        inDegree.put(char2, inDegree.get(char2)+1);
                    }
                    loopBroken = true;
                    break;
                }
            }
            // Check that second word isn't a prefix of first word
            if(!loopBroken)
                if(word2.length() < word1.length())
                    return "";
        }


        // Identify vertices that has inDegree = 0
        Queue<Character> queue = new LinkedList<>();
        inDegree.forEach((character, charInDegree) -> {
            if(charInDegree == 0)
                queue.add(character);
        });

        // Do the topological sort
        StringBuilder alienDict = new StringBuilder();
        while(!queue.isEmpty()) {
            Character node = queue.remove();
            alienDict.append(node);
            // Loop through neighbors set
            for(Character character :graph.getOrDefault(node, new ArrayList<>())) {
                inDegree.put(character, inDegree.get(character)-1);
                if(inDegree.get(character) == 0)
                    queue.add(character);
            }
        }

        if(alienDict.length() < inDegree.size())
            return "";
        
        return alienDict.toString();
    }
}