In this paper, we visit the question of originality in the context of creative influence across social networks, using with the tools of network analysis and digital image analysis. Our data sets come from a specialized online social network site (oSNS) called deviantArt, which is dedicated to sharing user-generated artworks. Currently, deviantArt (dA) is the 13th largest social network in the U.S. with 3.8 million weekly visits. Operating since 2000, dA is the most well known site for user-generated art; has more than 15 million members, and hosts a vast image archive of more than 150 million images. The social network infrastructure allows us to gather all sorts of rich data from this platform: which artist is following which other artist, what comments are posted under each artwork and by whom, the temporal unfolding of the connection structure, demographic information about artists, category and user generated tag information for artworks, number of hits per artwork and per artist, etc.
Our main research question: how can we define ‘originality’ in computable terms when analyzing such a massive dataset of visual artworks? Apart from its obvious relevance, we choose dA over other online platforms of art, because 1) unlike (online) museums, dA contains many derivative works, and at the same time provides sufficient information to trace these to their origins; 2) it has a category structure that defines stylistic boundaries; 3) it encompasses many artistic styles, some of which coming with cultural and artistic peculiarities that create unique analysis opportunities (e.g. ‘stock’ images, see below).
For this specific case study, we perused the vast archive of dA and focus on a subset of content, created specifically for being highly volatile and usable. These are the so-called ‘stock’ images; images that are freely available for other members to use. We extracted a set of stock images, linked to a second set of art works created through the use of these stock images, as well as a temporal cross-section of the dA network that encapsulates the social interactions of the producers and consumers of these stock images. Using this dataset, it becomes possible to trace the transformation of ‘visual’ ideas, both in terms of similarities in the image space, and in terms of distances in the social interaction space.
A social network is a graph representation of social relations. Graphs are the most popular and well-researched data structure for representing and processing relational data. In a graph, each node represents one entity (a person in a social network; a researcher or a work in a citation network, etc.) and the edges (or arcs, if they are directed) of the graph represent some relation. One can also indicate the strength of the relation by associating weights with the edges of the graph. Network graphs are very useful for studying social networks.1
The dA network is very rich, not only because it consists of 15 million nodes and billions of edges, but because it represents a multigraph of relations: Different social interaction patterns superpose different arcs on the node structure, and these are amenable to joint analysis, as well as individual inspection. Furthermore, each node (i.e. artist) is complemented with demographic information, and with site statistics showing the popularity of each member and each individual image. Obviously, the ‘re-creation’ of specialized networks from this immense set is crucial in clarifying and then interpreting it via network analysis and visualization tools.
As a preliminary research for this study, we have crawled the dA site to extract a subset of ‘professional’ members. Professional members are the paying subscribers, and thus contain the highest concentration of users that make a living out of producing artworks, as well as being more regular in their usage of the site in general. In order to obtain a vibrant core of the dA network, we have used a number of assumptions about these members. This helped us reach a manageable and relevant set of users. The first heuristic is the subscription status; the paying members of the site are more serious users and have access to more services. These can be automatically determined through scraping. Our first data reduction followed these members, and we thus obtained a network with 103.663 vertices and about 4.5 million arcs, the latter representing a user being ‘watched’ by another user (average degree is 43.25). (Buter et al. 2011) These data are further scrutinized with a k-Core algorithm that helped us to identify the most important members of the network. In this work we use this core-network as our starting point. Thus out of 103.663 vertices, i.e. members, we extracted a core of 3402 members. Among these, we focus on those who publish stock images, which constitute 642 members with 13762 images. To further analyze how these stock images are reused by other members, we employ digital image analysis.
Image analysis is a vibrant research area at the intersection of signal processing, computer vision, pattern recognition and machine learning. From medical image processing to image retrieval and biometrics, image analysis techniques are used in a variety of applications. Among these applications, the Digital Humanities community would find the scattered work done around automatic artwork analysis techniques interesting (Hurtut, 2010). These are used in a number of applications such as virtual restoration, image retrieval, studies on artistic praxis, and authentication. The scientific community also shows an interest in the ever expanding professional image archives from museums, galleries, professional artist networks, etc. (Chen 2001, 2005, 2007).
One of the authors, Lev Manovich has founded Software Studies Initiative at UCSD in 2007 to apply image analysis techniques for the analysis of digital content that were not necessarily based on institutionalized archives (Manovich 2008). For example, in How to Analyze Million Manga Pages, Manovich, Douglass and Huber showed how image analysis techniques could be applied to answer humanities research questions, especially the ones that have to deal with the fast growing user-generated image sets, and their cultural dimensions (Manovich et al. 2011).2 In our dA project, we make use of these tools in order to first extract ‘features’ (quantified descriptions of visual properties) from the images that we have located as stock image collections of core deviantArt members. As a second set, we download and analyze the images of members who indicate that they have used these images via the commenting tool of deviantArt. In Fig. 1, one can see an example of how a stock-image and the images that make use of this stock image look. The research questions in this study are 1) to determine in what (measurable) terms to define ‘originality,’ (i.e., how we can rate and compare different ways in which dA members use a stock image); 2) to determine if we can scale this definition for larger image sets.
Figure 1: An example of a stock image (right, upper corner), and images that use this stock.
Today it is possible to submit image queries to commercial search engines, where the subsequent image retrieval is not only based on the metadata, but also on the content of the images (for examples, see Li et al. 2007a, 2007b; Wang et al. 2008). Here, the ‘semantic’ component is retrieved from the images themselves: first, some low quality features of images are extracted; these features are used for tagging the images according to a pre-defined image vocabulary. A different approach to image retrieval is suggested by (Hertzmann et al. 2001), where parts of images themselves are used to locate images similar to those parts. Hertzmann terms this practice as finding ‘image analogies’. Our data set is based on such ‘image analogies’, but for us, it is more important to find the most ‘original’ works – i.e., images that uses the stock image creatively, changing to arrive at different content and visual style than the original version.
deviantArt is listed among the top ten most visited websites in the category of art. With 32 million unique visitors, dA has both the world’s biggest artist community and the largest active art audience. To thoroughly understand dA’s cultural and artistic value, one has to use computational methods because of its massive scale. In this study, we use subsets of dA images (stock images and images which use them) to explore how computational analysis can be used to define ‘originality’ in user-generated art in computable terms. To select and then analyze relevant subsets of the data, we designed a work flow which uses network and image analysis methods together. To the best of authors’ knowledge, this is the first project that combines network and image analysis from a humanities point of view. We submit our research as an example that might inspire and guide other researchers who have to deal with huge archives of user generated content.
Buter, B., N. Dijkshoorn, D. Modolo, Q. Nguyen, N. van Noort, B. van de Poel, A. A. Akdag Salah, and A. A. Salah (2011). Explorative visualization and analysis of a social network for arts: The case of deviantArt. Journal of Convergence 2(2): 87-94.
Chen, C., H. Wactlar, J. Wang, and K. Kiernan (2005). Digital imagery for significant cultural and historical materials. Int. Journal on Digital Libraries 5(4): 275-286.
Chen, H. (2001). An analysis of image retrieval tasks in the field of art history. Information Processing and Management 37(5): 701-720.
Chen, H. (2007). A socio-technical perspective of museum practitioners’ image-using behaviors. The Electronic Library 25(1): 18-35.
Hertzmann, A., C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin (2001). Image analogies. Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 327-340.
Hurtut, T. (2010). 2D Artistic images analysis, a content-based survey. http://hal.archives-ouvertes.fr/docs/00/45/94/01/PDF/survey.pdf (accessed 10 January 2011).
Manovich, L. (2008). Cultural Analytics: Analysis and Visualization of Large Cultural Data Sets. www.manovich.net/cultural_analytics.pdf (accessed 10 January 2011).
Manovich, L., J. Douglass, and W. Huber (2011). Understanding scanlation: how to read one million fan-translated manga pages. Image & Narrative 12(1): 206-228. http://www.imageandnarrative.be/index.php/imagenarrative/article/viewFile/133/104 (accessed 10 January 2011).
Li, Q., S. Luo, and Z. Shi (2007). Semantics-based art image retrieval using linguistic variable. Fuzzy Systems and Knowledge Discovery 2: 406-410.
Li, Q., S. Luo, and Z. Shi (2007). Linguistic expression based image description framework and its application to image retrieval. Soft Computing in Image Processing, pp. 97-120.
Wang, W., and Q. He (2008). A survey on emotional semantic image retrieval. ICIP 15th IEEE International Conference on Image Processing, pp. 117-120.
1.In our work we use Pajek for network analysis (http://vlado.fmf.uni-lj.si/pub/networks/pajek). We also make use of R for building networks (http://cran.r-project.org/), Sci2 for calculating basic network measurements (https://sci2.cns.iu.edu/user/index.php), as well as Gephi for visualizing the networks (www.gephi.org).