Wiki source code of AISOP domains

Last modified by Paul Libbrecht on 2025/04/17 21:43

author	version	line-number	content
		1	this page describes what an _AISOP domain_ is, a project that reflects the course material and how it is used in the AISOP webapp.
		2
		3	### Purpose of a domain
		4
		5	Contain all course-specific information that is used by the AISOP web-application so that it can apply to a seminar, display analyses and classify the portfolios made in these courses in a meaningful manner.
		6
		7	### Ingredients of a domain
		8
		9	Minimum: this allows the AISOP web-app to run:
		10
		11	* title and description
		12	* set of labels of level 1 and of other levels
		13	* concept-map (exported in CXL) where the labels are nodes
		14	* spacy model that allows the text-classification of paragraphs of portfolios made in this course: one for level 1, and one for each of the level 1 topics (to classify its subtopics)
		15	* sequence of analysis scripts and aggregation scripts to deliver the portfolio explorer and portfolio dashboard
		16
		17	Optional: this allows others to further develop the domain
		18
		19	* source content which contains the sentences used for training
		20	* extracted sentences/fragments
		21	* annotations for these extracted sentences within the labels of level 1 and the others
		22	* annotation statistics and model training results (in the form of statistics)
		23	* test sentences to verify the proper elementary function of the classifiers
		24	* test portfolios to verify the proper function
		25
		26	### Packaging of a domain
		27
		28	We propose that a domain be packaged as a directory which can be shared as a repository and to contain the following directory organization:
		29
		30	* the directory-name reflects the course name
		31	* the directory contains a file `about.json` with the properties `title`, `description`, `language` (in iso-639-3) and an array of strings for the authors, `subjects` an array of strings containing the LC-subject-classification, and `logo` (the link to a logo)
		32	* the directory contains a file `license.txt` with the license text
		33	* the directory contains a `labels.txt` file with a list of label names, organized in a hierarchy by simple indenting
		34	* the directory contains a file `pipeline.json` with the steps of the analysis and aggregation
		35	* the pipeline refers to the model spacy directories (level 1 and one for the children of each level 1) which are included
		36	* the concept-map used called `cmap.cxl` and its source `cmap.cmap` (for dev)
		37	* all models:
		38	* the `l1-model` directory is the spacy model for the classifier for the l1-topics
		39	* the `l2-models` contains a directory for each l1-topic which contains a spacy model for the sub-labels of this l1-topic
		40	* any extra file or directory mentioned as link
		41	* the `tests.txt` file contains the test fragments so that the debug tool can be used right away, one line per fragment
		42	* the `log.txt` file contains the statistical output of the training and/or statistitics: one line per label, one column par dimension
		43	* optionally, any file used for development, documented by a `README.md` (see below)
		44
		45	All paths of links used in the `about.json` and `pipeline.json` files can be resolved in a relative manner. For them to be recognized, we recommend to express relative paths with the syntax of starting with `./` as in `"logo":"./my-logo.svg"`. This allows the web-app to perform relative resolution in a secure way (not going outside of the domain directory except for known places) before it is given to the web-server or to the analysis scripts.
		46
		47	While the README.md should be the main entry point for the source work for creating the domain, we propose the following folder names:
		48
		49	- `source-content`: a collection of files (e.g. PDFs, pictures, texts, pptx, ...) that represent the source input from where an extraction is made
		50	- `extracts` is the result of the extraction process and is made of JSON files, one, or one folder, per source collection
		51	- `annotations` is the result of the annotations exported from prodigy in the form of JSONL files
		52	- moreover, instructions used and the log of all processes is visible in the `README.md` file

Applications

More applications

Need help?

If you need help with XWiki you can contact:

XWiki 13.10.2

Wiki source code of AISOP domains

Applications

Navigation

Need help?