<
From version < 6.1
edited by Paul Libbrecht
on 2025/04/17 21:43
To version < 2.1 >
edited by Paul Libbrecht
on 2025/04/12 22:30
Change comment: There is no comment for this version

Summary

Details

Page properties
Content
... ... @@ -1,52 +1,1 @@
1 -this page describes what an _AISOP domain_ is, a project that reflects the course material and how it is used in the AISOP webapp.
2 -
3 -### Purpose of a domain
4 -
5 -Contain all course-specific information that is used by the AISOP web-application so that it can apply to a seminar, display analyses and classify the portfolios made in these courses in a meaningful manner.
6 -
7 -### Ingredients of a domain
8 -
9 -Minimum: this allows the AISOP web-app to run:
10 -
11 -* title and description
12 -* set of labels of level 1 and of other levels
13 -* concept-map (exported in CXL) where the labels are nodes
14 -* spacy model that allows the text-classification of paragraphs of portfolios made in this course: one for level 1, and one for each of the level 1 topics (to classify its subtopics)
15 -* sequence of analysis scripts and aggregation scripts to deliver the portfolio explorer and portfolio dashboard
16 -
17 -Optional: this allows others to further develop the domain
18 -
19 -* source content which contains the sentences used for training
20 -* extracted sentences/fragments
21 -* annotations for these extracted sentences within the labels of level 1 and the others
22 -* annotation statistics and model training results (in the form of statistics)
23 -* test sentences to verify the proper elementary function of the classifiers
24 -* test portfolios to verify the proper function
25 -
26 -### Packaging of a domain
27 -
28 -We propose that a domain be packaged as a directory which can be shared as a repository and to contain the following directory organization:
29 -
30 -* the directory-name reflects the course name
31 -* the directory contains a file `about.json` with the properties `title`, `description`, `language` (in iso-639-3) and an array of strings for the authors, `subjects` an array of strings containing the LC-subject-classification, and `logo` (the link to a logo)
32 -* the directory contains a file `license.txt` with the license text
33 -* the directory contains a `labels.txt` file with a list of label names, organized in a hierarchy by simple indenting
34 -* the directory contains a file `pipeline.json` with the steps of the analysis and aggregation
35 - * the pipeline refers to the model spacy directories (level 1 and one for the children of each level 1) which are included
36 -* the concept-map used called `cmap.cxl` and its source `cmap.cmap` (for dev)
37 -* all models:
38 - * the `l1-model` directory is the spacy model for the classifier for the l1-topics
39 - * the `l2-models` contains a directory for each l1-topic which contains a spacy model for the sub-labels of this l1-topic
40 -* any extra file or directory mentioned as link
41 -* the `tests.txt` file contains the test fragments so that the debug tool can be used right away, one line per fragment
42 -* the `log.txt` file contains the statistical output of the training and/or statistitics: one line per label, one column par dimension
43 -* optionally, any file used for development, documented by a `README.md` (see below)
44 -
45 -All paths of links used in the `about.json` and `pipeline.json` files can be resolved in a relative manner. For them to be recognized, we recommend to express relative paths with the syntax of starting with `./` as in `"logo":"./my-logo.svg"`. This allows the web-app to perform relative resolution in a secure way (not going outside of the domain directory except for known places) before it is given to the web-server or to the analysis scripts.
46 -
47 -While the README.md should be the main entry point for the source work for creating the domain, we propose the following folder names:
48 -
49 -- `source-content`: a collection of files (e.g. PDFs, pictures, texts, pptx, ...) that represent the source input from where an extraction is made
50 -- `extracts` is the result of the extraction process and is made of JSON files, one, or one folder, per source collection
51 -- `annotations` is the result of the annotations exported from prodigy in the form of JSONL files
52 -- moreover, instructions used and the log of all processes is visible in the `README.md` file
1 +this page describes what a _domain_ is, a project that reflects the course material and how it is used in the AISOP webapp.

Need help?

If you need help with XWiki you can contact: