NEWS | R Documentation |
News for Package 'topicmodels'
Changes in topicmodels version 0.2-17
Internal changes to C++ code to make use of R_NO_REMAP by preprending
Rf_
.
Changes in topicmodels version 0.2-16
-
corpus.JSS.papers was removed as suggested package and the dataset added to the package to ensure that the vignette code can be successfully executed even if the package is not available for installation.
Changes in topicmodels version 0.2-15
Clarified the documentation for class
"LDA_Gibbscontrol"
to indicate thatiter
refers to the number of iterations in addition to theburnin
iterations specified.Typo resolved in ctm.c referring to long int while only int.
Changes in topicmodels version 0.2-14
System requirement
C++11
removed from the DESCRIPTION file.
Changes in topicmodels version 0.2-13
-
sprintf
replaced bysnprintf
in external code to avoid warnings. -
build_graph
removed due to archival of lasso2 on CRAN.
Changes in topicmodels version 0.2-12
Maintainer e-mail changed.
Changes in topicmodels version 0.2-11
The limit of file names in the external code was extended from 100 to 260 and a check for the length of the prefix included. Thanks to Julia Silge for pointing the issue out.
Changes in topicmodels version 0.2-10
The symbols and functions exposed by the external code were reduced. Thanks to Prof. Ripley for pointing out the problem.
Changes in topicmodels version 0.2-9
Issues concerning installation failures with gcc trunk aka 10 were fixed. Thanks to Prof. Ripley for pointing out the problem.
The authors field was improved to contain also the contributors of external code.
Changes in topicmodels version 0.2-8
Improve protection of R objects in the external code.
Changes in topicmodels version 0.2-7
Code shown for either installing package corpus.JSS.papers and loading the data from the package or obtaining the data with package OAIHarvester is now the same as actually used in the vignette.
Changes in topicmodels version 0.2-6
Protections for R objects added in the external code. Thanks to Tomas Kalibera for pointing out the potential problems.
Changes in topicmodels version 0.2-5
C++11 added to SystemRequirements. Thanks to Prof. Ripley for pointing out the problem.
The vignette was slightly modified with respect to the retrieval of topics for Volume 24. This was necessary because JSS has now different identifiers due to the change in its web page. Thanks to Bruce Spencer for pointing the problem out.
Package now uses registration for native (C) routines.
Changes in topicmodels version 0.2-4
Issues concerning memory deallocation in the C++ code were fixed. Thanks to Prof. Ripley for pointing the problem out and providing log files to help to identify the problem.
Changes in topicmodels version 0.2-3
Issues concerning memory deallocation in the C and C++ code and inclusion of headers were fixed. Thanks to Prof. Ripley for pointing the problem out, giving advise on how to fix the issues and providing log files to help to identify the problems.
Changes in topicmodels version 0.2-2
A bug in the CTM implementation which led to unnecessary use of memory fixed. Thanks to Florian Schwendinger for pointing the issue out.
Functions from package stats are now correctly imported before being used.
Changes in topicmodels version 0.2-1
-
tm version >=0.6 is required.
The data set
AssociatedPress
as well as other code checking document term matrices now conforms to the data structure of document term matrices in tm version >=0.6.
Changes in topicmodels version 0.2-0
The specification of a seed for Gibbs sampling now leads to a call to
set.seed
and the external code used for fitting accesses the state of the R random number generator. The seed can also be set toNA
(default) in order to not change the seed of the R random number generator when fitting the model.The Gibbs sampling method for fitting the LDA model now also returns the current topic assignments for all words which allows to initialize Gibbs sampling either using the current term distribution of topics or these assignments.
The Gibbs sampling method for fitting the LDA model now allows to specify seed words, i.e., assign higher a-priori weights to some words for some topics.
The word assignment matrix contained in the fitted models now does not have any dimnames any more.
Package corpus.JSS.papers is now listed in the DESCRIPTION file together with the information that is available from the additional repository https://datacube.wu.ac.at.
Changes in topicmodels version 0.1-12
Package topicmodels now depends on package methods instead of importing it.
Changes in topicmodels version 0.1-11
Package SnowballC is now suggested instead of Snowball.
Changes in topicmodels version 0.1-10
A check was added to ensure that no empty documents are in the data. Thanks to Terry Therneau for pointing the problem out.
The first argument in the functions printf_vector and printf_matrix defined in the C code for the CTM was corrected to be const char *. Thanks to Murray Stokely for providing the patch.
Changes in topicmodels version 0.1-9
A bug in function
posterior
was fixed where the rownames of the wrong object were used. Thanks to Benjamin S. Porter for pointing the problem out.Dependency structure changed such that some packages are now only imported.
The information printed during the VEM algorithm when
verbose
is larger than 0 was improved.
Changes in topicmodels version 0.1-8
The code in the vignette for removing HTML markup was modified due to changes in package XML.
Changes in topicmodels version 0.1-7
A memory leak in the code of the fit function for LDA with method
"VEM"
was corrected. Thanks to Ramis Yamilov for pointing the problem out.
Changes in topicmodels version 0.1-6
The included dataset AssociatedPress had row names which were of type integer and not of type character. The object was re-saved omitting the row names.
Changes in topicmodels version 0.1-5
Vignettes moved from /inst/doc to /vignettes.
The source code for fitting the model using Gibbs sampling was modified because the code did not compile on Solaris. Thanks to Prof. Brian D. Ripley for pointing the problem out.
-
dtm2ldaformat()
was modified to ensure that the resulting matrices for the documents contain integers. In additiondtm2ldaformat()
andldaformat2dtm()
were changed to also work for document-term matrices containing empty documents and an argument was introduced to indicate if empty documents should be removed. Thanks to Eu Jin Lok for pointing the problems out.
Changes in topicmodels version 0.1-4
Missing 'Suggests' entries added in the DESCRIPTION file. Thanks to Prof. Brian D. Ripley for pointing the problem out.
Changes in topicmodels version 0.1-3
Name tags for Rd files changed to not contain slashes. Thanks to Prof. Brian D. Ripley for pointing the problem out as indicated in bug PR14707.
Changes in topicmodels version 0.1-2
A small bug fixed when saving interim results for fitting a LDA model using Gibbs sampling. Thanks to Nicholas Switanek for pointing the problem out.
Changes in topicmodels version 0.1-1
Makevars.win changed due to changes on CRAN for making libgsl for Windows. Thanks to Prof. Brian D. Ripley for pointing that out.
Changes in topicmodels version 0.1-0
The package vignette has been published in the Journal of Statistical Software, Volume 40, Issue 13 (doi:10.18637/jss.v040.i13), and the paper should be used as citation for the package, run
citation("topicmodels")
for details.
Changes in topicmodels version 0.0-11
C code changed to allow the package to compile on Solaris systems. Thanks to Prof. Brian D. Ripley for pointing the problems out and recommending suitable changes.
Changes in topicmodels version 0.0-10
C code changed to avoid warnings of unused variables.
Changes in topicmodels version 0.0-9
The slots for documents and terms names are not restricted to be of class
"vector"
any more to allow for document-term matrices where no row and/or column names are provided.
Changes in topicmodels version 0.0-8
A function
perplexity()
added for model validation and selection.The input data for
LDA()
andCTM()
can now either be a"DocumentTermMatrix"
with term-frequency weighting or an object coercible to a"simple_triplet_matrix"
with integer entries.A bug in the C++ Gibbs sampling code fixed for the random number generation. Thanks to Uwe Ligges for pointing the problem out which he noted when checking the package for the Windows platform.
New control arguments added for keeping intermediate log-likelihood values during estimation and running repeated runs with random initilization. In addition the number of iterations made is now saved with the fitted model.
Functions
ldaformat2dtm()
anddtm2ldaformat()
added to transform data from the lda package into a"DocumentTermMatrix"
object and vice versa.Bug fixed in rctm.c where for
estimate.beta = FALSE
one EM step was performed.
Changes in topicmodels version 0.0-7
The control for topic models now also has a
seed
argument to ensure reproducibility of results and aestimate.beta
argument which can be used to fix the term distribution over topics after initialization.The control for Gibbs sampling allows to specify to return repeated draws in a list using arguments
burnin
,thin
anditer
.In slot beta for class
"TopicModel"
the log parameters are stored to have a higher accuracy for the VEM code if parameter values are close to zero.Call to assert removed in C code to avoid termination of R.
Class
"TopicModel"
now has a slotloglikelihood
. For models fitted using Gibbs sampling this contains the loglikelihood of the corpus, for VEM fitted models the vector of loglikelihoods for each document separately.
Changes in topicmodels version 0.0-6
Memory bug fixed in
returnObjectGibbsLDA
.A slot
save
is added to the control objects to specify if the results and with which step size intermediate results are saved into files.
Changes in topicmodels version 0.0-5
Header files changed in utilities.cpp following an advice by Prof. Brian D. Ripley.
Changes in topicmodels version 0.0-4
Code for installing the package corpus.JSS.papers in the vignette improved.
-
dir.create()
now called withshowWarnings = FALSE
. Bug fixed in
get_most_likely()
for maximum possible k.First version released on CRAN: 0.0-3.