Data Format

The data file contained in ChatterBot Corpus is formatted using YAML syntax. This format is used because it is easily readable by both humans and machines.

Corpus Properties

Property

Required

Description

categories

Required

A list of categories that describe the conversations.

conversations

Optional

A list of conversations. Each conversation is denoted as a list.

Here is an example of the corpus data:

categories:
- english
- greetings
conversations:
- - Hello
  - Hi
- - Hello
  - Hi, how are you?
  - I am doing well.
- - Good day to you sir!
  - Why thank you.
- - Hi, How is it going?
  - It's going good, your self?
  - Mighty fine, thank you.

The values in this example have the following relationships.

Evaluated statement relationships

Statement

Response

Hello

Hi

Hello

Hi, how are you?

Hi, how are you?

I am doing well.

Good day to you sir!

Why thank you.

Hi, How is it going?

It’s going good, your self?

It’s going good, your self?

Mighty fine, thank you.