Wednesday, September 28, 2016

Comments on "To what extent can maintenance problems be predicted by code smell detection? – An empirical study"

Article: To what extent can maintenance problems be predicted by code smell detection? – An empirical study

I highly recommend to all to read this article, it is big but have valuable details of great experiment. I doubt that smb could conduct smth similar as it is very costly.

Some valuable quotes from article:

Results: From the total set of problems, roughly 30% percent were related to files containing code smells. ......
Conclusions: The role of code smells on the overall system maintainability is relatively minor, thus complementary approaches are needed to achieve more comprehensive assessments of maintainability.
Moreover, to improve the explanatory power of code smells, interaction effects amongst collocated smells and coupled smells should be taken into account during analysis.

Is 30% of problems are minor ? 30% of problem that can be predicted on the moment of writing code in automatic mode without wasting time of other engineers. I doubt that it is minor.

Lets take a look deeply to report:

Twelve different code smells were detected in the systems via Borland Together and InCode.

Ancient systems , non of them is alive now !
Where is opensource stack of tools like: Checkstyle, PMD, FindBugs, SonarQube, IntelijIdea inspectionsnondexforbiddenapis, error-prone, ..... ?

We report on a multiple case study in which the problems and challenges faced by six developers working on four different Java systems were registered on a daily basis, for a period up to four weeks.
.....
there is a substantial body of work that investigates if certain source code characteristics (i.e., a code smell) affect a given maintenance outcome (e.g., effort, changes, defects).

Selected model is pretty impressive and based on real code.

Individual progress meetings (20–30 min): were conducted daily, with each of the developers and the researcher present at the study to keep track of the progress, and register problems encountered during the project (ex. Dev: ‘‘It took me 3 h to understand this method. . .’’)

3h is too much for a method , if that is more than 20 min or even 5 min - it is problem and code smell.

Code smells automatically detected in the systems:
Data Class
Data Clumps
Duplicated code in conditional
branches
Feature Envy
God (Large) Class
God (Long) Method
Misplaced Class
Refused Bequest
Shotgun Surgery
Temporary variable for various
purposes
Use interface instead of
implementation
Interface Segregation Principle
(ISP) Violation

it is strange why that minimal amount of code smells are detected only , look like list whole smells were not that big. Not clear why minor smells are ignored in investigation. In you have 5K minor problems in project , you have a lot of problems, and you will pay for that on daily basis. Minor problems stumble engineer for 2min, 5min .... but that time is valuable too! At the end of the day it could come up to 30min or even an hour.
Minors can not be ignored, as they can make engineer to tire and slightly demotivate and in general performance will be slightly lower.

In total, 137 different problems were identified from the differ-
ent maintenance projects. From the total, 64 problems (47%) were
associated to Java source code. The remaining 73 (53%) constituted
problems not directly related to code such as: lack of adequate
technical infrastructure, developer’s coding habits, external ser-
vices, runtime environment, and defects initially present in the
system.

almost 50% of problems could be find automatically . And by the way, "developer’s coding habits" is also code smell.

From the total set of difficulties identified during maintenance,
less than half (43%) were related to Java code, and from those, only
58% clearly related to any of the twelve code smells used to analyze
the code. This means that even if we count those difficulties that
are due to combination of factors, roughly only 30% of the total
set of difficulties can be explained and potentially foreseen by code
smells. As a result, we conclude that the subset of aspects that are
covered by current code smell detection has a relatively low
...
The results from our study reminds us of the limitations of evaluations based purely on static analysis and suggest the need of more comprehensive quality models and techniques that can incorporate the analysis of diverse factors.

Even in average it is 30% for purely static analysis ..... is it low ? Just offer your manager 30% bust in productivity in development that could be archive by using free tools in CI or IDE integrations!
I doubt I would say "Do not bother about this."

Thursday, April 28, 2016

How to do ssh-copy-id on SunOS

There is no ssh-copy-id command on SunOS(Solaris, SmartOS).

How to do ssh-copy-id on/from SunOS(Solaris, SmartOS) to another server:
cat ~/.ssh/id_rsa.pub | ssh user@server "mkdir -p ~/.ssh; cat >> ~/.ssh/authorized_keys"

Friday, April 15, 2016

How to print whole command from ps in SunOs Solaris SmartOS

[sb@c-apps01 ~]$ uname -a
SunOS c-apps01 5.11 joyent_20160121T174331Z i86pc i386 i86pc Solaris

[sb@c-apps01 ~]$ ps -aef | grep mail
      sb 52561  5477   0 18:47:58 pts/38      0:00 grep mail
      sb 38414 38411   0 18:30:41 ?           0:03 /opt/java8/bin/java -mx256m -classpath :/www/mail-notifier/lib/act
      sb 38411 38408   0 18:30:40 ?           0:00 bash /www/notifier/sh/_shell.sh --processor com.revere.auth.j
      sb 43546 48142   0 14:38:12 ?           0:00 /www/bin/wd -p /www/sb/jobs/notifier/w.pid -- /www/sb
      sb 38408 43546   0 18:30:40 ?           0:00 bash /www/notifier/sh/listener.sh


As might notice all commds are cut to 80 symbols.


Answer is (undocumented "w" option): 
ps axwwl



Side note:
how to list all open ports on SunOs: "lsof -P"

Sunday, February 7, 2016

Master development problem with pictures

we need to decide: Jenkins should test pre-rollout state or post prollout state of sources.

Workflow:  master --> development --> topicBranch
Advantages:
Simple exception stack trace examination from PROD (only require switch to master)
new iteration/feature come with version bump so PROD code could be bumped with mvn-release-plugin without worries to interfere with development.
only 2 branches are required
Disadvantages:
master is not correspond to PROD after release was roll-backed on PROD!!!!! no way to revert/reset master without pain for all developers (conflicts are unavoidable).
person that do rollout to PROD have to remember to do merge from development to master before rollout to PROD - RomanIvanov always fail to this on time. Rollout could be done in out of working hours by SysAdm, so it impossible to sync that events.
"hot fix branches"(http://git.server.com/cgit/ld/log/?h=master-1.7-hotfix) is better than do changes in master and then force rebase on development that could not be that simple. Two master("master","mastr-hotfix") branch appear and same problem is raised "Where is prod code?"
in case of hotfix over "master" Jenkins have no easy ability to test changes before rollout (if you follow a model merge to master after rollout ) ...
any changes to master, with development far in forward, make it impossible to keep Fast forward merge
keep master-1.7-hotfix branch is not good for jenkins as it does not aware of it - so you deploy changes that are not tested.

Problem: we released to RPOD repgen but not a RP, so I can not do merge to master as this two projects lives in same repo - is it git structure problem or problem of your workflow?
Problem: reports 1.3 and 1.4  - we cannot do fast-forward merge as we have conflicts in changes (we did cherry-pick from master to development), we can not do just merge without conflicts, with non fast-forward we will loose all commits history as all of them will come to one merge-commit, tags will be lost as we change SHA1 for commits. Should we do reset of master till common commit and then do fast-forward merge ? should we store all hot fixes ? should we name it like branch and keep in git for a while but in this case how it is different from alternative git workflow ?
Problem: 'themes' - fix was applied to master and then in few days was applied to development, on merge development to master I got conflict, I could resolve it - but in master I will got merge-commit. I could revert that fix in master that will help me to merge without problem but still - one merge commit(history is lost). I did HACK by reset to common commit with development and that allowed me do fast-forward (master keep history).  Resume: looks like to keep history we need to work in master, or treat development branch as the most valuable branch (that never have to be removed or lost).

Delema: how to merge from developement to master?  up to a tag(3.1.6) or next commit after tag ?
in case of "next commit after tag" - it is convenient to do versioning as we stay on SNAPSHOT
in case of "up to tag" - is more correct as we rollouted exact version, it even make more sense as it comes to master-X.X.X-hotfix creation where it will unclear in history why it was required to revert changes from 3.1.7-SNAPSHOT to 3.1.6.1-SNAPSHOT as in master 3.1.7 have never existed.

Problem: master, master-3.2.12-hotfix, master-3.2-hotfix  (all branches were created as mentioned here) -  master-3.2.12-hotfix contains tag 3.2.12.3 that rollouted on PROD, after rollout from master-3.2-hotfix by tag 3.2.14 we can not remove master-3.2.12-hotfix branch as it contains tag that could be useful in case rolback on PROD.
Command "git merge -s ours master-3.2.12-hotfix" will not help as do exactly what we need from keeping commits/tags in history, but it DO CREATE ADDITIONAL/MERGE commit, that will make problem when it comes to development->master merge.

problem: master ans master-hotfix, master-hotfix depend on DB changes that is incompatible with master code. So we have master that is failing on UTs but master-hotfix is not. So model of checksout "master*" in Jenkins does not help.

!!problem!!: if repository is composite of few projects that could be releases/rollouted/tagged separately (ld and ld-data-job) we can not merge to master all as ld-data-job is rollouted to RPOD, but some changes in ld-server are not rollouted and cannot be merged to master as it brake main idea that master===prod  code.

Workflow : hot-fix branch <-- master --> topicbranch
Advantages:
...
only 1 common branches is required
Disadvantages:
not clean when remove hot-fix branches - but will not be a problem
Problem: we didnot bumped minor version in pom till rollout to prod we do few rollouts with build number increase, but still not ready to prod, and now we got PROD critical issue, we come back to prod revision to create branch, but we cannot use maven:release plugin as next few build numbers already occupied by rollouts to dev. Will be jars override in maven secure, can we trust to this ?
How we will secure UT passed for Prod version before rollout to prod ? not sure that developer will remember to goto Jenkins change config to new commit reference and launch configuration.... but I could be not right here .

It is highly required for ease and speed of development and fix delivering to have hot-fix branch to backport some features/fixes and rollout them more frequently, without being blocked by next rollout changes.

Alternatives:
http://blogs.atlassian.com/2013/05/maven-git-flow-plugin-for-better-releases/

Performance problem while using Checks that extends AbstractJavadocCheck

Get all Javadoc Checks that are using Grammar parser for Javadoc:
~/java/git-others/checkstyle/checkstyle [master|✔] $ grep -l -r --include "*.java" --exclude "*Test.java" "extends AbstractJavadocCheck" . | sed -E 's/.\/src\/.*javadoc\///' | sed "s/Check.java//"
JavadocTagContinuationIndentation
NonEmptyAtclauseDescription
SingleLineJavadoc
JavadocParagraph
SummaryJavadoc
AtclauseOrder


Execution test that show performance degradation in 15 times (19sec vs 5m8sec):

$ grep -E "JavadocTagContinuationIndentation|NonEmptyAtclauseDescription|SingleLineJavadoc|JavadocParagraph|SummaryJavadoc|AtclauseOrder|AbstractJavadoc" google_checks.xml 
        <module name="NonEmptyAtclauseDescription"/>
        <module name="JavadocTagContinuationIndentation"/>
        <module name="SummaryJavadoc">
        <module name="JavadocParagraph"/>
        <module name="AtclauseOrder">
        <module name="SingleLineJavadoc">
$ grep -E "JavadocTagContinuationIndentation|NonEmptyAtclauseDescription|SingleLineJavadoc|JavadocParagraph|SummaryJavadoc|AtclauseOrder|AbstractJavadoc" google_checks_only-one-javadoc-check.xml

        <module name="NonEmptyAtclauseDescription"/>
$ grep -E "JavadocTagContinuationIndentation|NonEmptyAtclauseDescription|SingleLineJavadoc|JavadocParagraph|SummaryJavadoc|AtclauseOrder" google_checks_no-javadoc-checks.xml
$

$ time java -jar checkstyle-6.15-all.jar -c google_checks_no-javadoc-checks.xml guava/
Starting audit...
Audit done.

real 0m19.532s
user 0m56.985s
sys 0m0.807s

$ time java -jar checkstyle-6.15-all.jar -c google_checks.xml guava/
Starting audit...
Audit done.

real 5m8.913s
user 6m25.283s
sys 0m1.287s

$ time java -jar checkstyle-6.15-all.jar -c google_checks_only-one-javadoc-check.xml guava/
Starting audit...
Audit done.

real 5m7.450s
user 6m14.414s
sys 0m1.179s



The only excuse for such performance degradation is that parsing is done correctly in comparison to RegExp implementation of parsing of javadoc that have anonymous amount of bugs.

This need to be fixed.....

How to get statistic info from checkstyle-result.xml


grep "source=" checkstyle-result.xml | sed 's/.*source="//' | sed 's/"\/>//' | sed 's/Check//' | sed 's/com.puppycrawl.tools.checkstyle.checks.//' | sort | uniq -c | sort -n