# Allergen Assessment

`2020-12-30`

Original Prompt ## Part 1

If you’re having trouble parsing this problem, don’t worry - it’s pretty confusing.

Each allergen is exactly one of the nonsense words and they are **not** always listed. Here’s the example, but with simplified words so that we (as humans) can more easily read it:

`A B C D (contains 1, 2)E F G A (contains 1)C F (contains 3)C A G (contains 2)`

We know that `A`

must be `1`

because it’s the only ingredient that appears in both lines 1 and 2. `2`

could be anything that’s in both lines 1 and 4, so both `C`

and `A`

are potential allergens. `3`

only appears once, so everything in that line are potentials allergens as well. All in all, `A, C, F`

are the potential allergens. Let’s code that up.

The input parsing is pretty straightforward. We’ll be making liberal use of `set`

s, since they’re easy to get sums and differences for:

```
translation = {}safe_ingredients = set()ingredients_lists: List[Set[str]] = []
for line in self.input: raw_ingredients, raw_allergens = line.rstrip(")").split(" (contains ") ingredient_set = set(raw_ingredients.split(" "))
ingredients_lists.append(ingredient_set) # need this soon safe_ingredients.update(ingredient_set) # need this soon
for allergen in raw_allergens.split(", "): if allergen in translation: translation[allergen] = translation[allergen] & ingredient_set else: translation[allergen] = ingredient_set
```

We might have used a `defaultdict`

here for `translation`

, but `set() & {1,2}`

is always `set()`

, so we need to only do the intersection with non-empty data. If we run the above on the example, we get:

`{ 'dairy': {'mxmxvkd'}, 'fish': {'sqjhc', 'mxmxvkd'}, 'soy': {'sqjhc', 'fvjkl'}}`

The prompt asks about the number of occurrences of the ingredients that *aren’t* in those sets. While we walk the list, we’ll create a set (`safe_ingredients`

) that has *all* the ingredients and pull from it after that first pass:

```
for line in self.input: ...
for unknown_allergen in possible_translations.values(): safe_ingredients -= unknown_allergen
```

All that’s left is to walk each ingredient set and count how many times safe ingredients appear.

`return sum( [len(ingredients & safe_ingredients) for ingredients in ingredients_lists])`

## Part 2

Now we actually have to figure out which allergens map to which ingredients. This’ll play out a lot like day 16. We start with any allergens that map to a single ingredient (like `A`

in the cleaned example above), then remove that one from all other places it’s a potential. We keep doing that until all the potential values are empty.

The only tricky thing here is that we have to take care to not modify the size of the dict as we’re iterating through it. Instead, we consume the generator before iterating through it, which looks a little unusual:

```
found_allergens = {}while possible_translations: # consume the generator before we start walking for allergen, ingredients in list( (k, v) for k, v in possible_translations.items() if len(v) == 1 ): found_ingredient = ingredients.pop() found_allergens[allergen] = found_ingredient del possible_translations[allergen] # remove the ingredient from contention in any remaining unknowns for a in possible_translations: possible_translations[a].discard(found_ingredient)
return ",".join(v for _, v in sorted(found_allergens.items()))
```