Fairy Tale Summarizer

As part of a 4th year Artificial Intelligence course at the University of Waterloo I developed a program to summarize fairy tales. The program and analysis was designed and implemented in partnership with Nick Breen (www.nickbreen.ca)


Our text summarizer focuses on summarizing fairy tales.

Here is an explanation of how our code works and the scoring methods employed:

  1. Read in the fairy tale, identify the sentences and group them by paragraphs
  2. Identifying Keywords:
    -Frequently used proper nouns are probably important characters in the story.
    -Words in the title of the story are keywords
    -Try to identify Protagonist and Antagonist, using our list of ‘positive’ and ‘negative’ attributes. Protagonist: One of the most frequent proper nouns, and has the highest number of ‘positive’ words in the same sentencesAntagonist: second One of the most frequent proper nouns, and has the highest number of ‘negative’ words in the same sentences
  3. Score the sentences using the following criteria:
    +1 for keywords
    +2 for first sentence in a paragraph
    +1 for first paragraph
    + 1 for last paragraph
    +3 for protagonist
    +3 for antagonist
  4. Choosing Sentences for summary:
    We want highest scores, but also want a summary that spans the entire story.
    So for each paragraph select the sentence with the highest score, and include this in our summary. The sentence must also have a score that satisfies the constant threshold, this prevents us choosing low scoring sentences in a paragraph of all low scoring sentences.
    Also add on, if not already included, the following sentences, that given the domain of fairy tale are known to be important:Sentence introducing conflict:
    -Highest scoring sentence with protagonist and antagonist, if does not exist go for highest scoring with protagonist.Conclusion of Conflict:
    -Last sentence with antagonist, and above 1/3 of the threshold. We do not want to remove the threshold, but it should carry less weight.

    Happily Ever After:
    -Last sentence with protagonist, and above 1/3 of the threshold. We do not want to remove the threshold, but it should carry less weight.

     Any sentences over 2+ threshold, and not already included:
    -this prevents us from missing a very important sentence just because its in a paragraph with another very important sentence.


evaluateing our summarizer

The evaluation of a text summarizer will depend greatly on what the program will be used for, this determines how accurate the summarizer must be, which criteria are most important, and which information is most important to the reader.

We have come up with a couple of different ways of evaluating our summarizer based on the users goals.

Identifying known fairy tales

Our first evaluation method would be for users who just need enough of a summary to tell which fairy tale it is. This evaluation works very well with this subject matter because fairy tales can be taken for granted as general knowledge, and only require a couple of clues for the user to be able to figure it out.

Reverse Scoring with Questions: give people the auto generated summaries and see if they can identify the fairy tale. Then conduct the same survey with another sample group, except use human generated ‘ideal’ extract summaries. Use the % of correct answers from the ideal summary as the threshold to determine if the automated version has succeeded.

This evaluation will let us see how well the program works for helping people identify the fairy tale based on the summary.

Finding Main Points

Our second idea is to focus on the main points of the fairy tale, not extract sentences. We would get human experts to list what they think the main points are, and then use this as a criterion like an exam marker would. If the summary covers the point it gets a mark, this way it doesn’t matter which sentences it chose, it is just important that the summary has the important information. Use this method to determine Retention Ratio. A precondition is that the people must be able to create an extract summary that would satisfy all of their main points, ie. gets a perfect Retention Ratio.


 

Performing an evaluation of our summarizer

In the section above, we mentioned two different possible strategies for evaluating the summarizer; we did not perform the first strategy of evaluation because this requires a large sample of human participators to be effective. The second method only requires a few experts to provide a summary of the main points of each story, then we mark each summary to see how many of the points it covers, this gives us our RR.

We also have the CR for each summary, so we can look at both of these to get an idea of how well the summarizer is performing.

We believe that a good fairy tale summary should be <25% of the original document, and retain over 75% of the information.

Cinderella Eval

Compression Ratio: %13.91
Retention Ratio: 8/12 = %66
Main Points:
Cinderella is a nice and beautiful girl.  1
Cinderella has mean step sisters. 1
The Prince is hosting a ball 0
Cinderella’s godmother magically gives her a magnificent coach and clothing to go to the ball: 1
Cinderella must be back by midnight. 1
The prince is entranced by Cinderella’s beauty. 0
Cinderella leaves before the prince finds out who she is. 1
Cinderella accidently leaves her slipper at the ball 1
The prince tries to find her by looking for who the shoe fits. 1
The slipper fits Cinderella. 1
Cinderella and the prince marry, and they live happily ever after. 0
Cinderella is kind to her stepsisters, and does not exact revenge. 1

 


To see analysis of more stories please scroll to the bottom.


Overall Performance

Average RR: %75.4 (With Red Riding Hood)
Average RR: %79.6

Average CR: %19.6 (With Red Riding Hood)
Average CR: %17.2

These averages fall within our acceptable range for an adequate Fairy Tale summary, compression < 25% and retention ratio >75%.

The reason we have singled out poor performance by little red riding hood is because litte, red, riding, hood are all considered to be different characters. As a result our algorithm ignores the woodsman and grandmother. We are aware of this problem, but did not have time to implement a fix. This is also an edge case, since main characters often only have a one or two word name.


Rupunzel Eval

Compression Ratio:  %15.87
Retention Ratio:  9/11 =  %81.8
Main Points:
Rapunzel has long beautiful hair. 1
Rapunzel is trapped in a tower by the witch. 1
The witch is mean. 1
One must use Rapunzel’s long hair to climb up the tower.  1
A prince falls in love with Rapunzel. 1
The witch cuts off Rapunzel’s hair. 1
The witch banishes Rapunzel to the desert. 0
The witch tricks the prince into climbing up the tower. 1
The prince wanders the wilderness alone. 1
The prince finds Rapunzel 1
The two live happily ever after 0

Hansel And Grettle Eval

Compression Ratio: %18.37
Retention Ratio: 10/12 = %83.33
Main Points:
Hanzel and Grettle’s parents try to loose their children in the woods. 0
Hanzel and Grettle make it back to parents house using a stone trail. 1
Mother is angry that they return. 1
Hanzel and Grettle are brought into the woods again. 1
Hanzel and Grettle do not find their way back home. 1
Hanzel and Grettle find a house made out of ediable candy, they start eating. 1
Hanzel and Grettle meet an old women who owns the house. 1
The women is an evil witch and traps them. 1
Hanzel tricks the witch. 1
Hanzel and Grettle kick the witch into the fire. 0
They left the witches house to return home. 1
The kids and the father lived happily ever after with the witche’s riches. 1

Snow White Eval  *

* increasing compression ratio to %32, leads to a RR of  13/14 = %92.9
Compression Ratio: %20.66
Retention Ratio: 12/14 = %86
Main Points:
Queen has a baby girl, named snow white   1
Queen dies 1
King marries new lady, who is evil 0
New Queen has a mirror that tells the Queen she is the most beautiful1
One day Mirror says snow white is most beautiful 0
Queen  tries to kill snow white 1
Snow white ends up in the forest 1
Snow White ends up living with 7 dwarves 1
Queen gives poison apple to snow white. 1
Dwarves put snow white in a coffin 1
Prince sees her and falls in love 1
Snow White wakes up 1
Queen is banished 1
Prince and Snow White live happily ever after 1

Red Riding Hood Eval
Compression Ratio:   %29
Retention Ratio: 7/12  =%58
Main Points:
There is a girl named Red Riding Hood 1
Red Riding Hood is going to visit her grandmother. 1
Red Riding hood promises to be careful 1
Red Riding Hood meets a wolf 0
The wolf beats Red Riding Hood to Grandmother’s house. 0
The wolf eats the grandmother and impersonates her. 0
Little Red Riding Hood asks grandmother a series of questions. 1
Wolf attacks little red riding hood 1
A woodsman saves Red Riding Hood from the wolf.0
Grandmother is saved 0Red riding hood learns her lesson 1
Everyone lives happily ever after. 1


Post Tagged with ,