Методы организации информации

Санкт-Петербург, осень 2013

Описание

Методы организации информации (information management) - относительно новая дисциплина, включающая вопросы теории баз данных, информационного поиска, извлечения знаний из данных и другие смеждые области научных исследований и технологических методов и средств.

Преподаватели

Список лекций

Кластерный анализ. Теория и Алгоритмы
  • Hamerly G., Elkan C. Alternatives to the k-means algorithm that find better clusterings //Proceedings of the eleventh international conference on Information and knowledge management. – ACM, 2002. – С. 600-607.

  • Coates A., Ng A. Y. Learning feature representations with k-means //Neural Networks: Tricks of the Trade. – Springer Berlin Heidelberg, 2012. – С. 561-580

  • Ester M. et al. A density-based algorithm for discovering clusters in large spatial databases with noise //KDD. – 1996. – Т. 96. – С. 226-231.

  • Ankerst M. et al. OPTICS: ordering points to identify the clustering structure //ACM SIGMOD Record. – 1999. – Т. 28. – №. 2. – С. 49-60.

Кластерный анализ. Теория и Алгоритмы
  • Agrawal R. et al. Automatic subspace clustering of high dimensional data for data mining applications. – ACM, 1998. – Т. 27. – №. 2. – С. 94-105.

  • Von Luxburg U. A tutorial on spectral clustering //Statistics and computing. – 2007. – Т. 17. – №. 4. – С. 395-416.

  • Elhamifar E., Vidal R. Sparse subspace clustering: Algorithm, theory, and applications. – 2012.

  • Microarray cluster analysis and applications Instructor: Prof. Abraham B. Korol Institute of Evolution, University of Haifa/

Организация информации в метрических пространствах
  • Alahakoon D., Halgamuge S. K., Srinivasan B. Dynamic self-organizing maps with controlled growth for knowledge discovery //Neural Networks, IEEE Transactions on. – 2000. – Т. 11. – №. 3. – С. 601-614.

  • Linde Y., Buzo A., Gray R. An algorithm for vector quantizer design //Communications, IEEE Transactions on. – 1980. – Т. 28. – №. 1. – С. 84-95.

  • Smith L. I. A tutorial on principal components analysis //Cornell University, USA. – 2002. – Т. 51. – С. 52.

Организация информации в метрических пространствах
  • Tenenbaum J. B., De Silva V., Langford J. C. A global geometric framework for nonlinear dimensionality reduction //Science. – 2000. – Т. 290. – №. 5500. – С. 2319-2323.
    На семинаре представляет: Зайберт Валерия

  • Donoho D. L., Grimes C. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data //Proceedings of the National Academy of Sciences. – 2003. – Т. 100. – №. 10. – С. 5591-5596.
    На семинаре представляет: Дерипаска Анна, 545 группа

  • Coifman R. R. et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps //Proceedings of the National Academy of Sciences of the United States of America. – 2005. – Т. 102. – №. 21. – С. 7426-7431.
    На семинаре представляет: Кузенкова Анастасия

Анализ текстов на естественном языке
  • Joshua T. Goodman, A bit of progress in language modeling, Computer Speech & Language, Volume 15, Issue 4, October 2001, Pages 403-434, ISSN 0885-2308

  • David Chiang. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL '05). Association for Computational Linguistics, Stroudsburg, PA, USA, 263-270. 2005.

  • Sujith Ravi and Kevin Knight. Deciphering foreign language. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 (HLT '11), Vol. 1. Association for Computational Linguistics, Stroudsburg, PA, USA, 12-21. 2011.

  • Nitin Jindal and Bing Liu. Mining comparative sentences and relations. In proceedings of the 21st national conference on Artificial intelligence - Volume 2 (AAAI'06), Anthony Cohn (Ed.), Vol. 2. AAAI Press 1331-1336. 2006.

Анализ текстов на естественном языке
  • Sujith Ravi and Kevin Knight. Minimized models for unsupervised part-of-speech tagging. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1 (ACL '09), Vol. 1. Association for Computational Linguistics, Stroudsburg, PA, USA, 504-512. 2009.

  • Dan Klein and Christopher D. Manning. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1 (ACL '03), Vol. 1. Association for Computational Linguistics, Stroudsburg, PA, USA, 423-430. 2003.

  • Jenny Rose Finkel and Christopher D. Manning. Joint parsing and named entity recognition. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL '09). 2009.

Визуализация данных
  • Peng W., Ward M. O., Rundensteiner E. A. Clutter reduction in multi-dimensional data visualization using dimension reordering //Information Visualization, 2004. INFOVIS 2004. IEEE Symposium on. – IEEE, 2004. – С. 89-96.

  • Ahmed A. et al. Visualisation and analysis of large and complex scale-free networks //Proceedings of the Seventh Joint Eurographics/IEEE VGTC conference on Visualization. – Eurographics Association, 2005. – С. 239-246.

  • Obayashi S., Sasaki D. Visualization and data mining of Pareto solutions using self-organizing map //Evolutionary multi-criterion optimization. – Springer Berlin Heidelberg, 2003. – С. 796-809.

Оптимизация запросов
  • Yannis E. Ioannidis: Query Optimization. The Computer Science and Engineering Handbook 1997: 1038-1057

  • Donald Kossmann and Konrad Stocker. 2000. Iterative dynamic programming: a new class of query optimization algorithms.

  • Fragkiskos Pentaris and Yannis Ioannidis. 2006. Query optimization in distributed networks of autonomous database systems. ACM Trans. Database Syst. 31, 2 (June 2006), 537-583. DOI=10.1145/1138394.1138397 http://doi.acm.org/10.1145/1138394.1138397

  • Donald Kossmann, Michael J. Franklin, Gerhard Drasch, and Wig Ag. 2000. Cache investment: integrating query optimization and distributed data placement. ACM Trans. Database Syst. 25, 4 (December 2000), 517-558. DOI=10.1145/377674.377677 http://doi.acm.org/10.1145/377674.377677

Алгоритмы выполнения Top-k запросов
  • Daniele Braga, Alessandro Campi, Stefano Ceri ,and Alessandro Raffio. Joining the results of heterogeneous search engines. Inf. Syst., 33 (7-8), 2008.

  • Martin Theobald, Gerhard Weikum, Ralf Schenkel. Top-k Query Evaluation with Probabilistic Guarantees. VLDB 2004

  • Benjamin Arai, Gautam Das, Dimitrios Gunopulos, and Nick Koudas. Anytime Measures for Top-k Algorithms. VLDB 2007.

Индексирование и поиск в многомерных пространствах
  • Gionis, A.; Indyk, P., Motwani, R. (1999). , Similarity Search in High Dimensions via Hashing. Proceedings of the 25th Very Large Database (VLDB) Conference.

  • Jacox E. H., Samet H. Metric space similarity joins //ACM Transactions on Database Systems (TODS). – 2008. – Т. 33. – №. 2. – С. 7.

  • Hinneburg A., Aggarwal C. C., Keim D. A. What is the nearest neighbor in high dimensional spaces?. – Bibliothek der Universität Konstanz, 2000. – С. 506-515.

Базы данных для аналитической обработки
  • Daniel J. Abadi, Samuel R. Madden, and Nabil Hachem. 2008. Column-stores vs. row-stores: how different are they really?. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD '08). ACM, New York, NY, USA, 967-980.
    На семинаре представляет: Миша Винник

  • Peter A. Boncz, Martin L. Kersten, and Stefan Manegold. 2008. Breaking the memory wall in MonetDB. Commun. ACM 51, 12 (December 2008), 77-85.
    На семинаре представляет: Руслан Мокаев

  • Daniel Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating compression and execution in column-oriented database systems. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD '06). ACM, New York, NY, USA, 671-682.
    На семинаре представляет: Екатерина Бойкова

  • Hasso Plattner. 2009. A common database approach for OLTP and OLAP using an in-memory column database. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data (SIGMOD '09), Carsten Binnig and Benoit Dageville (Eds.). ACM, New York, NY, USA, 1-7.
    На семинаре представляет: Екатерина Фоменко

Базы данных для аналитической обработки
  • Abadi, D. J., Myers, D. S., DeWitt, D. J., and Madden, S. R.: Materialization strategies in a column-oriented DBMS. In Proc. ICDE, 2007.
    На семинаре представляет: Оксана Долматова

  • Stavros Harizopoulos, Velen Liang, Daniel J. Abadi, and Samuel Madden. 2006. Performance tradeoffs in read-optimized databases. In Proceedings of the 32nd international conference on Very large data bases (VLDB '06), VLDB Endowment 487-498.
    На семинаре представляет: Коноплев Юрий.

  • Martin Grund, Jens Kruger, Hasso Plattner, Alexander Zeier, Philippe Cudre-Mauroux, and Samuel Madden. 2010. HYRISE: a main memory hybrid storage engine. Proc. VLDB Endow. 4, 2 (November 2010), 105-116.
    На семинаре представляет: Дмитрий Мордвинов

  • Milena G. Ivanova, Martin L. Kersten, Niels J. Nes, and Romulo A.P. Goncalves. 2010. An architecture for recycling intermediates in a column-store. ACM Trans. Database Syst. 35, 4, Article 24 (October 2010), 43 pages.
    На семинаре представляет: Сергеев Дмитрий

Базы данных для аналитической обработки
  • S. Idreos, M. Kersten, and S. Manegold. Database Cracking. In CIDR, 2007.

  • Stratos Idreos, Martin L. Kersten, and Stefan Manegold. 2009. Self-organizing tuple reconstruction in column-stores. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data (SIGMOD '09), Carsten Binnig and Benoit Dageville (Eds.). ACM, New York, NY, USA, 297-308.

  • Goetz Graefe and Harumi Kuno. 2010. Self-selecting, self-tuning, incrementally optimized indexes. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT '10), USA, 371-381.

Методы классификации
  1. Cover, Thomas, and Peter Hart. Nearest neighbor pattern classification. Information Theory, IEEE Transactions on 13, no. 1 (1967): 21-27. На семинаре представляет: Миленин Антон Анатольевич

  2. Cortes, Corinna, and Vladimir Vapnik. Support-vector networks. Machine learning 20, no. 3 (1995): 273-297.

  3. Lafferty, John, Andrew McCallum, and Fernando CN Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. (2001).

  4. Tsoumakas, Grigorios, and Ioannis Katakis. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM) 3, no. 3 (2007): 1-13. На семинаре представляет: Наташа Соковикова, 545 группа.

Методы классификации
  1. Yu, Lei, and Huan Liu. Feature selection for high-dimensional data: A fast correlation-based filter solution. In ICML, vol. 3, pp. 856-863. 2003. На семинаре представляет: Чередник Кирилл 545гр.
  2. Blitzer, John, Kilian Q. Weinberger, and Lawrence K. Saul. Distance metric learning for large margin nearest neighbor classification. In Advances in neural information processing systems, pp. 1473-1480. 2005.
  3. Schuldt, Christian, Ivan Laptev, and Barbara Caputo. Recognizing human actions: a local SVM approach. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 3, pp. 32-36. IEEE, 2004. На семинаре представляет: Мария Крень, 541 группа
  4. Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pp. 79-86. Association for Computational Linguistics, 2002. На семинаре представляет: Наташа Соковикова, 345 группа.