By Guest on Thursday, 01 May 2014
Replies 5
Likes 0
Views 102
Votes 0
Hi all Rapidminers!

Please see what accuracy level and output you are getting by analyzing subject and body without using concatenation. You will have to apply weights in various combinations for the subject and body so that it totals to 1.0. So, for example, if you apply 0.6 weight to body, you have to apply 0.4 weight to subject. Note the accuracy levels you are getting using 10 validations and see how the test emails are classified and compare it. Please discuss the outputs here.
Hi,

Please refer to the picture for the output which I have got. The best output which I have got is under the following conditions:

1. Validations: 10
2. Subject weight: 0.9
3. Body weight: 0.1
4. Operators used under 'Process Documents from Data' - 'Tokenize, Filter Stopwords, Filter Tokens by Length (lower limit: 3 upper limit: 999), Filter Tokens by Content: string filtered - www with 'inverse' condition.

Please put in your outputs for discussions and deciding on the best combination.
Under this condition, 2 texts have been predicted as Davison correctly.
·
11 years ago
·
0 Likes
·
0 Votes
·
0 Comments
·
The results are changing in two different conditions. They are:
1. While 'Filter stopwords' and 'Stem(porter)' are used, the result is misclassified for Davison and other categories.
2. While 'Filter tokens by length' and 'Filter tokens by content' are used, the result is classifying emails more correctly.
·
11 years ago
·
0 Likes
·
0 Votes
·
0 Comments
·
stem porter is clearly affecting the result , funnily in negative way
·
11 years ago
·
0 Likes
·
0 Votes
·
0 Comments
·
Davison: 4
Phone & Network: 4
Web: 3
Others: 9
Weight to Subject: 0.9
Weight to Body: 0.1
Model Accuracy: 71.71% +/- 8.44% (mikro: 71.63%)
Screenshot2014-05-0117.17.04.png
·
11 years ago
·
0 Likes
·
0 Votes
·
0 Comments
·
Hi

It seems that the model is working, and is automated. Though in some cases, the emails are misclassified. But, this is the best, which I am getting. See the XML file.

trainingdata_validationscheduled_unscheduled_xml.txt


test_output_xml.txt
·
11 years ago
·
0 Likes
·
0 Votes
·
0 Comments
·
View Full Post